Lectures

CMMRS will include lectures from faculty at the Cornell University, University of Maryland, Cornell and the Max Planck Institutes.


Giulia Guidi, Cornell University: High-Performance Computing Meets Biology

The use of massively parallel systems is playing a crucial role in new and diverse areas of data science, such as computational biology and data analytics. Computational biology is a key area in which data processing is rapidly increasing. The growing volume of data and increasing complexity have outpaced the processing capacity of single-node machines in these areas, making massively parallel systems an indispensable tool.

The emerging complex challenges in computational biology require large-scale parallel computing infrastructures. Furthermore, as we enter the post-Moore’s Law era, effective programming of specialized architectures is critical to improving the performance of high-performance computing. As large-scale systems become more heterogeneous, their efficient use for new, often irregular, and communication-intensive data analysis computation becomes increasingly complex. This talk will discuss how performance and scalability can be achieved on extreme-scale systems while maintaining productivity for new data-intensive biological challenges, and how high performance can be achieved on new specialized AI architectures such as SRAM-based Graphcore IPUs.


Ming C. Lin, University of Maryland, College Park: Reconstructing Reality:  From Physical World to Virtual Environments

With increasing availability of data in various forms from images, audio, video, 3D models, motion capture, simulation results, to satellite imagery, representative samples of the various phenomena constituting the world around us bring new opportunities and research challenges. Such availability of data has led to recent advances in data-driven modeling.  However, most of the existing example-based synthesis methods offer empirical models and data reconstruction that may not provide an insightful understanding of the underlying process or may be limited to a subset of observations.

In this talk, I present recent advances that integrate classical model-based methods and statistical learning techniques to tackle challenging problems that have not been previously addressed.   These include flow reconstruction for traffic visualization, learning heterogeneous crowd behaviors from video, simultaneous estimation of deformation and elasticity parameters from images and video, and example-based multimodal display for VR systems.  These approaches offer new insights for understanding complex collective behaviors, developing better models for complex dynamical systems from captured data, delivering more effective medical diagnosis and treatment, as well as cyber-manufacturing of customized apparel.  I conclude by discussing some possible future directions and challenges.


Alan Zaoxing Liu , University of Maryland, College Park

Lecture 1: Introduction to Network Attacks

This lecture aims to provide a comprehensive introduction to basic and advanced network attacks that pose significant threats to modern Internet infrastructures, such as Distributed Denial of Service (DDoS) and advanced persistent threats (APT). I will begin with an overview of the concepts related to network architecture and the role of security protocols. I will delve into several representative network attacks, categorizing them based on their methods and targets. Attendees will gain an understanding of how these attacks function and their potential impacts on the society.

Lecture 2: Defending Advanced Network Attacks with Programmable Networks

This lecture will discuss effective detection and mitigation solutions for latest DDoS (e.g., volumetric and link flood attacks) and APT attacks. With the emergence of highly flexible network devices such as programmable switches and network interface cards, we as the defender can design more performant and cost-effective software and hardware defense solutions. I will delve into a research prototype that leverages programmable network hardware to mitigate DDoS attacks with minimal performance degradation. Finally, I will chart paths to designing future advanced network defense systems.


David Mimno, Cornell University

Lecture 1: What language models do, and how they do it

The lecture will cover the history of language modeling, from early word counting methods to contemporary transformers. We will cover tokenization, predictive probabilities, attention mechanisms, and encoder/decoder architectures. We will use the Huggingface pytorch API to compare model outputs, explore network activations, and do a quick example of model finetuning.

Lecture 2: Language models and data

In this lecture we will discuss how language models are trained from a data perspective. We will cover how text data is collected, how it is used, and what implications those choices have for the behavior of models. We will introduce few-shot and prompt-based learning and explore how training data choices affect these capabilities. We will touch on legal and ethical issues around data use.


Danupon Nanongkai, MPI for Informatics: Modern Graph Algorithms

There have been many fast algorithms discovered recently for graph problems that otherwise witnessed no progress in the last few decades.  In this lecture series, we will explore some of the techniques underlying these advances. The focus will be on recent advances in computing shortest paths and problems related to maximum flow. While the lectures will focus on sequential algorithms, we will also discuss applications of these techniques in other models of computation such as distributed, dynamic, and streaming algorithms. (In fact, these settings are where some of the techniques originated from.)


Peter Schwabe, MPI for Security and Privacy: The next generation of cryptographic software

Already since Shor’s seminal paper from 1994 we know that once physicists and quantum engineers are able to build a large universal quantum computer, our current generation of asymmetric cryptography will be broken. With increasing progress towards such a quantum computer becoming reality, the world is currently moving to a new generation of cryptography: so called post-quantum cryptography. A major step towards the deployment of this new generation of primitives for key agreement and digital signatures is a (still ongoing) effort by NIST to identify suitable candidate schemes for standardization. In July 2022 NIST selected the first batch of lgorithms for standardization; the only key-agreement scheme in this batch is CRYSTALS-Kyber, which is expected to become a standard this summer. In my lectures I will explain the design of CRYSTALS-Kyber and then illustrate the challenges we will be facing with secure and efficient implementations and deployment of post-quantum cryptography.


Rachee Singh, Cornell University: TBD


Adish Singla, MPI for Software Systems: TBD


Milijana Surbatovich, University of Maryland, College Park: Type Systems for Intermittent Computing.

Energy-harvesting devices (EHDs) are a new class of embedded computing platform
that are powered solely from energy collected from the environment, without
using batteries. These devices enable new applications in domains like disaster
monitoring, body implants, or smart city infrastructure. Unfortunately,
environmental energy is scarce, so EHDs are powered only intermittently,
experiencing frequent failures that make correct programming difficult. This
situation is especially problematic because the envisioned domains have high
assurance requirements; we do not want applications for medical devices or
critical infrastructure to have bugs! Thus, my research has looked at how we
can use programming language techniques to design systems for EHDs that have
*provable* correctness guarantees.

In these lectures, I first introduce the field of “intermittent computing” on
EHDs, showing why frequent power failures cause incorrect execution and how
these errors can be addressed. I then cover the basics of formal programming
language semantics and type checking, showing how to leverage these to identify
and prove desired correctness properties. Finally, I connect the two topics by
describing my recent work in developing type systems for reasoning about
intermittent execution and discuss how these ideas apply to other emerging
computing architectures, beyond EHDs.