2023 Simons Collaboration on the Theory of Algorithmic Fairness Annual Meeting
Omer Reingold, Stanford University
Bailey Flanigan, Carnegie Mellon University
Moritz Hardt, Max Planck Institute for Intelligent Systems, Tübingen
Jon Kleinberg, Cornell University
Katrina Liggett, Hebrew University
Huijia (Rachel) Lin, University of Washington
Emma Pierson, Jacobs Technion-Cornell Institute
Salil Vadhan, Harvard University
Manolis Zampetakis, UC Berkeley
• 2022 Annual Meeting
The Simons Collaboration on the Theory of Algorithmic Fairness aims to establish firm mathematical foundations, through the lens of computer science theory, for the emerging area of algorithmic fairness. Given the unique multidisciplinary nature of this topic, collaboration research looks inwards and outwards. Inwards towards the models, definitions, algorithms, and mathematical connections that the theory of computations can contribute to this effort and outwards towards other areas in computer science and beyond. Looking outside facilitates a discourse with different and at times opposing perspectives, a fruitful exchange of techniques and exposure to important application areas.
The second annual meeting of the collaboration will showcase progress in Algorithm Fairness as well as exciting insights and breakthrough results in the areas of Algorithms, Computational Social Sciences, Cryptography, Differential Privacy, Dynamic Systems, Game Theory, Machine Learning as well as further away areas such as the medical sciences, public policy, and law. The diverse set of participants will allow for informal discussions and new collaborations.
The second annual meeting of the Simons Collaboration on the Theory of Algorithmic Fairness took place in New York City on February 2nd and 3rd, 2023. We will discuss here the goals of the meeting within the larger operation of the collaboration as well as some of the shared content.
Meeting goals and future meetings:
The stated goals of our collaboration are “establishing firm mathematical foundations, through the lens of CS theory, for algorithmic fairness” and “developing a rich and widely accepted understanding of the Algorithmic Fairness landscape.” The two goals go hand in hand, as this is a highly multidisciplinary area, and the relevance of the theory work relies on interaction with external fields.
A major part of the meeting was outreach to an incredibly diverse set of participants. Beyond the collaboration members, we had participants from other areas of theory (such as learning theory, game theory, privacy and cryptography) and external participants in health care, law, philosophy and social sciences. While most participants were from academia, we had a considerable representation to industry. Accordingly, the talks were focused on communicating our work to the larger community, hearing external perspectives and exploring new realms in which theory can be applied. The overall sentiment that both internal and external participants expressed was that the meeting was stimulating and thought provoking.
In April, we will be having a meeting of a more technical nature (as in computer science theory, machine learning and statistics) taking place in the Simons Institute at Berkeley. The workshop, titled “Multi-Group Fairness and the Validity of Statistical Judgment” is devoted to exploring the surprising impact of the algorithmic fairness work (and especially, notions of multigroup fairness) to the mathematical modeling and validity of data science tasks.
In June, the students and postdocs of the collaboration will be meeting in the outskirts of the annual symposium on the Foundation of Responsible Computing (FORC 2023) to foster internal collaboration and community building.
Finally, our weekly seminars and monthly PI meetings are continuing to lead to a cohesive research effort.
A major theme of the meeting was expanding the reach of ideas from algorithmic fairness towards a larger set of highly motivated concerns.
Jon Kleinberg discussed a new model for maximizing user utility in the consumption of online content based on observational data. The framework is based on a stochastic model of user behavior, in which users are guided by two conflicting sets of preferences — one that operates impulsively in the moment (potentially pushing to consume more content that is locally appealing), and the other of which makes plans over longer timescales (potentially pushing towards more substantial content or towards tasks in the outside world). The framework raises questions about the role of social media in modern society and about the interactions between algorithmic mechanism to promote user engagement and behavioral and psychological considerations.
Katrina Ligett described a new model of the interaction between users and data-driven companies. Ligett places a large variety of societal concerns within the perspective of data ecosystems where vectors of information going back and forth between individuals and computational platforms. Within this model, Ligett discussed bridging technological and legal considerations.
Moritz Hardt addressed concerns arising from the gig economy and its interaction with machine learning. What is the impact of collective action of organized employees? What fraction of the employees should organize and pool their data together in order to affect the learning algorithms run by the gig platform? Hardt discussed the consequences of this model in three fundamental learning-theoretic settings: the case of a non-parametric optimal learning algorithm, a parametric risk minimizer and gradient-based optimization.
A different theme of the meeting was exploring technical developments with relation to algorithmic fairness and related questions.
Rachel Lin discussed one of the most exciting results from the field of cryptography. Specifically, Lin described her seminal work on program obfuscation. Lin introduced a mathematical formalization of this goal called indistinguishability obfuscation (iO). In the past decade, Lin and others showed that iO is a “master tool” for enabling a wide range of cryptographic goals. In her recent work, Lin constructed iO based on three well-studied computational hardness conjectures — a result that has been hailed in the cryptographic community.
Manolis Zampetakis, a recently graduate student of PI Daskalakis, surveyed their research on learning from data that is systematically biased. The impact of faulty data on algorithmic discrimination is a major challenge in this field. One specific data fault is known as truncation, where a segment of the data is removed (salaries above a threshold for example). A different fault is known as self-selection (where a population is under-represented in the data based on preferences correlated with the learned features). Zampetakis surveyed a collection of new techniques to addressing such data faults.
A third theme is regarding the applicability of the work in practice and in other domains.
Emma Pierson discussed real-life examples of discrimination in health care (women’s health and cancer-risk prediction) and in policing. These examples present challenges for the collaboration to formalize and rigorously address. Indeed, the area of health care is one in which this collaboration has been particularly impactful.
Salil Vadhan discussed the great success of differential privacy as an area of theory that has had impact on various scientific fields and in unusual deployment in industry and the in the U.S. census. Vadhan discussed the lessons that can be transformed to the area of algorithmic fairness to accelerate its applicability.
Bailey Flanigan discussed fairness concerns in voting theory. Specifically, mechanisms for accounting for stakes in public choices and voting rules.
THURSDAY, FEBRUARY 2
9:30 AM Jon Kleinberg | The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization 11:00 AM Bailey Flanigan | Accounting for Stakes in Democratic Decision Processes 1:00 PM Moritz Hardt | Algorithmic Collective Action in Machine Learning 2:30 PM Manolis Zampetakis | Analyzing Data with Systematic Bias: Truncation and Self-Selection 4:00 PM Emma Pierson | Using Machine Learning to Increase Equity in Healthcare and Public Health
FRIDAY, FEBRUARY 3
9:30 AM Salil Vadhan | Differential Privacy's Transition from Theory to Practice 11:00 AM Katrina Liggett | Data Privacy is Important, But It's Not Enough 1:00 PM Huijia (Rachel) Lin | How to Hide Secrets in Software?
Abstracts & Slides
Carnegie Mellon University
Accounting for Stakes in Democratic Decision Processes
View Slides (PDF)
It is a ubiquitous idea that, to ensure a high-quality outcome of a collective decision, the greatest stakeholders should have sufficient say in the decision process. As we show, the cost of failing to account for stakes can be formalized in the language of voting theory: when voters have different stakes in a decision, all deterministic voting rules (which do not account for voters’ differing stakes) have unbounded welfare loss: that is, all such rules will sometimes select outcomes that give essentially zero welfare to the population overall. This is alarming given that in practice, the voting rules we use tend to be deterministic, and voters’ stakes are likely to vary.
Bailey Flanigan will explore two solutions to this problem, both which involve accounting for voters’ stakes. In the first solution, Flanigan will show that the welfare loss is substantially reduced when voters are altruistic: when deciding how to vote, they weigh not only their own interests, but also the interests of others. As such, altruism can be interpreted as voters implicitly accounting for each other’s stakes. In the second solution, we formalize how to design democratic processes that explicitly account for voters’ stakes, and we show that doing so can again substantially reduce the welfare loss. Throughout the talk, Flanigan will discuss how these conclusions can be operationalized in the design of democratic processes.
Max Planck Institute for Intelligent Systems
Algorithmic Collective Action in Machine Learning
We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm’s learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collective goal. We investigate the consequences of this model in three fundamental learning-theoretic settings, the case of a non-parametric optimal learning algorithm, a parametric risk minimizer, and gradient-based optimization. In each setting, we come up with coordinated algorithmic strategies and characterize natural success criteria as a function of the collective’s size. Each setting admits the existence of vanishingly small critical thresholds for the fractional size of the collective at which success occurs. Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. Taken together, our theory and experiments broadly support the conclusion that algorithmic collectives of exceedingly small fractional size can exert significant control over a platform’s learning algorithm.
The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization
View Slides (PDF)
As we think about the fundamental settings in which algorithms mediate decisions about how people are treated, one of the most central is the way that online content is presented to users on large Internet platforms. Online platforms have a wealth of data, run countless experiments, and use industrial-scale algorithms to optimize user experience. Despite this, many users seem to regret the time they spend on these platforms. One possible explanation is that incentives are misaligned: platforms are not optimizing for user happiness. We suggest the problem runs deeper, transcending the specific incentives of any platform, and instead stems from a mistaken foundational assumption. To understand what users want, platforms look at what users do. This is a kind of revealed-preference assumption that is ubiquitous in user models. Yet research has demonstrated, and personal experience affirms, that we often make choices in the moment that are inconsistent with what we want: we can choose mindlessly or myopically, behaviors that feel entirely familiar on online platforms.
In this work, Jon Kleinberg and collaborators have developed a model of media consumption where users have inconsistent preferences. Kleinberg will consider what happens when a platform that simply wants to maximize user utility is only able to observe behavioral data in the form of user engagement. The framework is based on a stochastic model of user behavior, in which users are guided by two conflicting sets of preferences — one that operates impulsively in the moment, and the other of which makes plans over longer timescales. By linking the behavior of this model to abstractions of platform design choices, we can develop a theoretical framework and vocabulary in which to explore interactions between algorithmic intermediation, behavioral science and social media.
This talk is based on joint work with Sendhil Mullainathan and Manish Raghavan.
Data Privacy is Important, But It’s Not Enough
View Slides (PDF)
Our current data ecosystem leaves individuals, groups and society vulnerable to a wide range of harms, ranging from privacy violations to subversion of autonomy to discrimination to erosion of trust in institutions. In this talk, Katrina Ligett will discuss the Data Co-ops Project, a multi-institution, multi-disciplinary effort Ligett co-leads with Kobbi Nissim. The project seeks to organize our understanding of these harms and to coordinate a set of technical and legal approaches to addressing them. In particular, Ligett will discuss recent joint work with Ayelet Tapiero-Gordon and Alex Wood, wherein they argue that legal and technical tools aimed at controlling data and addressing privacy concerns are inherently insufficient for addressing the full range of these harms.
Huijia (Rachel) Lin
University of Washington
How to Hide Secrets in Software?
Since the initial public proposal of public-key cryptography based on computational hardness conjectures (Diffie and Hellman, 1976), cryptographers have contemplated the possibility of a “one-way compiler” that can transform computer programs into unintelligible forms while preserving functionality.
In this talk, Rachel Linn will introduce a mathematical formalization of this goal called indistinguishability obfuscation (iO). In the past decade, it has been shown that iO is a “master tool” for enabling a wide range of cryptographic goals. Linn will then describe our recent construction of iO based on three well-studied computational hardness conjectures.
This talk is based on joint work with Aayush Jain and Amit Sahai..
Using Machine Learning to Increase Equity in Healthcare and Public Health
View Slides (PDF)
Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint — that data science and machine learning can also be used to illuminate and reduce inequity in healthcare and public health — by presenting vignettes from domains including policing, women’s health and cancer-risk prediction.
Differential Privacy’s Transition from Theory to Practice
View Slides (PDF)
Differential privacy is a mathematical framework for ensuring that individual-level information is not revealed through statistical analysis or machine learning on sensitive datasets. It emerged from the theoretical computer science literature through a series of papers from 2003–2006 and has since become the subject of a rich body of research spanning many disciplines. In parallel, differential privacy made a remarkably rapid transition to practice, with large-scale deployments by government agencies like the U.S. Census Bureau and by large tech companies such as Google, Apple, Microsoft and Meta, and differential privacy is also playing a core role in an emerging industry of privacy tech startups. In this talk, Salil Vadhan will try to summarize differential privacy’s transition to practice, speculate about the features of differential privacy and the surrounding context that enabled this transition, and discuss some of the challenges encountered along the way.
University of California, Berkeley
Analyzing Data with Systematic Bias: Truncation and Self-Selection
In a wide range of data analysis problems, across many scientific disciplines, we only have access to biased data due to some systematic bias of the data collection procedure. In this talk, Manolis Zampetakis will present a general formulation of fundamental statistical estimation tasks with systematic bias and show recent results from joint work on how to handle two very fundamental types of systematic bias that arise frequently in econometric studies: truncation bias and self-selection bias.
This talk is based on joint works with: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas, Vasilis Kontonis and Christos Tzamos.