Lectures | The Cornell, Maryland, Max Planck Pre-doctoral Research School in Computer Science (CMMRS 2025)

CMMRS 2025 will include lectures from faculty at the Cornell University, University of Maryland, and the Max Planck Institutes.

Rediet Abebe, ELLIS Institute, Tübingen

When does resource allocation require prediction?

Algorithmic predictions are emerging as a promising tool for efficiently allocating societal resources. Fueling their use is the prevailing belief that accurately identifying individuals at the highest risk of adverse outcomes—such as loan defaults, poor health, or dropping out of school—is a key bottleneck. In this talk, we challenge this assumption by drawing on insights from numerous empirical and theoretical studies. In the first part of this talk, we propose a principled framework for evaluating the efficacy of prediction-based allocations in settings where individuals belong to larger units, such as neighborhoods, hospitals, or schools. In the second part of the talk, we study the role of timing, where a decision-maker has to trade off waiting to gather more data to improve prediction accuracy, or allocating resources earlier with noisier predictions. In both settings, we surface inequality as a fundamental mechanism influencing whether gains in prediction accuracy translate to more efficient allocations. In settings with high levels of inequality, allocations based on coarse information such as through aggregate unit-level statistics–such as average drop-out rates in schools—or through noisy predictions based on preliminary data on individuals can suffice. These findings provide a more nuanced perspective on the prediction-allocation gap and the critical role that structural forces can play in improving allocation outcomes through better predictions.

This presentation is primarily based on joint work with Ali Shirali, Ariel Procaccia, and Moritz Hardt.

Bahar Asgari, University of Maryland, College Park

Lecture 1: From General‑Purpose CPUs to Domain‑Specific Architectures

For decades, the computing ecosystem has been dominated by general‑purpose CPUs (and more recently GPUs) whose flexibility comes at the expense of energy, area, and cost efficiency. With Moore’s Law slowing, contemporary systems can no longer rely on brute‑force transistor scaling to close the performance gap between peak and sustained throughput. This lecture introduces the fundamental principles of computer architecture with a focus on why “designing for the common case” limits performance. We will cover essential building blocks of domain‑specific architectures (DSAs) including dataflow architectures, systolic arrays, streaming accelerators, sparsity in workloads, and the Roofline performance model. By the end of this lecture, students will understand how DSAs achieve orders‑of‑magnitude improvements in throughput and energy efficiency over general‑purpose hardware, laying the groundwork for the more advanced, research‑oriented topics in Lecture 2.

Lecture 2: Reconfigurable DSAs for Adaptive Modern Workloads

As modern workloads such as ML/AI or advanced scientific computing become increasingly heterogeneous, static DSAs struggle to deliver consistently high performance across varied data characteristics. Reconfigurable computing bridges this gap by enabling hardware to adapt its dataflow and resource allocation at runtime. In this lecture, we will explore two state‑of‑the‑art research efforts. First, we’ will examine a machine learning–guided approach for dynamically selecting optimal dataflow schemes in sparse matrix‑matrix multiplication, demonstrating how decision trees and reinforcement learning can outperform static heuristics. Second, we will study a partially reconfigurable accelerator for sparse scientific computing that dynamically balances latency, resource utilization, and solver convergence. Students will gain insight into the methodology of designing, evaluating, and benchmarking adaptive hardware architectures, and will leave prepared to identify open research questions at the intersection of machine learning, reconfigurable systems, and domain‑specific computing.

Meeyoung (Mia) Cha, MPI for Security and Privacy (MPI-SP)

AI, Society, and Computing: Leveraging Global Data While Tackling Ethical Challenges

AI agents powered by individuals’ data are reshaping everyday interactions on social platforms and marketplaces. At a planetary scale, the aggregation of multi-modal data enables transformative applications such as poverty mapping, socioeconomic predictions, and disaster assessments. However, these advancements also bring systemic risks, including biases, misinformation, and threats to democratic institutions.

This talk explores how thoughtful design in database systems and AI can mitigate these challenges, foster sustainable development, and uphold human-centered values. It further advocates for the vigilant oversight of societal-scale AI applications to prevent dual-use and other ethical concerns and instead promote the benefit for humanity. I will also discuss my journey through my research on misinformation, as well as my life as a data scientist, based on my experiences collaborating with world-class scientists at Meta, AT&T Research, and Microsoft, as well as NGOs such as the United Nations Pulse Lab and the World Customs Organization.

Christina Giannoula, University of Toronto / MPI for Software Systems

Hardware-Efficient Computational Kernels

Algorithms that overlook hardware characteristics not only degrade performance but may also limit the overall potential of applications. To fully unlock performance in modern computing systems, algorithms must be carefully tailored to the capabilities and constraints of the underlying hardware. In this lecture, I will highlight the importance of designing algorithms that align with architectural features and examine the key factors that impact performance, including parallelism and synchronization, memory access patterns, and data movement.

I will present two parallel algorithms that are meticulously co-designed with CPU and GPU architectures. These examples leverage fine-grained application characteristics to enable efficient execution. By the end of the lecture, students will gain insight into how to develop hardware-aware optimizations and adapt algorithms to exploit architectural strengths while mitigating hardware limitations—ultimately enabling high-performance computing across modern platforms.

Krishna Gummadi, MPI for Software Systems

Towards Better Foundations for Foundational Models: A Cognitivist Approach to Studying Large Language Models (LLMs)

Foundational, generative models like large language models have captured popular imagination with their versatility to be adapted to a wide-range of tasks that have traditionally been viewed as requiring human intelligence. However, their unreasonable effectiveness remains largely unexplained, i.e., even the designers of these models cannot explain why they work or when they might fail. In the talk, I will motivate our investigations into some basic curiosity-driven questions about LLMs; specifically, how LLMs receive, process, organize, store, and retrieve information.

Our analysis is centered around engaging LLMs in two specific types of cognitive tasks: first, syntactically-rich (semantically-poor) tasks such as recognizing formal grammars, and next, semantically-rich (syntactically-poor) tasks such as answering factual knowledge questions about real-world entities. Using carefully designed experimental frameworks, we attempt to answer the following foundational questions:
(a) how can we estimate what latent skills and knowledge a (pre-trained) LLM possesses?
(b) (how) can we distinguish whether some LLM has learnt some training data by rote (i.e., memorization) vs. by understanding?

I will present empirical results from experimenting with a number of large open-source language models and argue that our findings have important implications for the privacy of training data (including potential for memorization), the reliability of generated outputs (including potential for hallucinations), and the robustness of LLM-based applications (including preference biases exhibited by LLM-based agents).

Justin Hsu, Cornell University

Type Systems: Between Theory and Practice

To outsiders, research on type systems—and programming languages in general—can seem highly intimidating, with forests of symbols, obscure technical jargon, formidable mathematical abstractions, or sometimes all of the above. In the first lecture, I’ll try to demystify this area by focusing on what type systems are, what they can do, and why they are interesting. In the second lecture, I’ll present a case study of type systems that apply abstract constructions from category theory to yield automated and scalable tools for a highly concrete problem: analyzing rounding error in floating-point programs.

Manuel Gomez Rodriguez, MPI for Software Systems (MPI-SWS)

Counterfactuals in Machine Learning

“Had I clicked on the attachment of that email, my computer would have been hacked.’’ Reasoning about how things could have turned out differently from how they did in reality is a landmark of human intelligence. Such type of reasoning, called counterfactual reasoning, has been shown to play a significant role in the ability that humans have to learn from limited past experience and improve their decision making skills over time. Is counterfactual reasoning a human capacity that machines cannot have? Surprisingly, recent advances at the interface of machine learning and causality have demonstrated that it is possible to build machines that perform and benefit from counterfactual reasoning, in a way similarly as humans do. In this lecture, you will learn about counterfactuals in machine learning, including its use in AI-assisted decision making, explainability, safety, fairness and reinforcement learning.

Abhinav Shrivastava, University of Maryland, College Park

TBD (Vision/Robotics)

Alexandra Silva, Cornell University

Algebraic Network Verification

I will present NetKAT, a language based on Kleene Algebra with Tests, that has been used in network verification. I will show recent developments on its verification engine, including the design of efficient data structures to reason about equivalence. I will then show different extensions of NetKAT and how they enable more expressive verification and analysis techniques.