Omer Reingold, Stanford University
The first annual meeting of the Simons Collaboration on the Theory of Algorithmic Fairness will bring together many of the participants of the collaboration as well as others in the areas of algorithmic fairness, theoretical computer-science and in other relevant fields. The meeting will expose the participants to the unique challenges and opportunities of this field, which presents an area of research that is complex, subtle and multidisciplinary in an intrinsic fashion. No model or definition can fully capture such complexities and partial models and solutions carry the risk of legitimizing discrimination and talks will highlight some of these challenges.
The rising importance of this area and the significant attention given to unfairness of algorithms provide opportunities for exceptional societal impact which is rare in theoretical research while the early state of this field offers the prospects of intellectual impact. The meeting will expose participants to some of the ways in which both sorts of impact have started to materialize, and to explore avenues for additional impact.
Noam Barda, Ben-Gurion University
Noa Dagan, Ben Gurion University and Clalit Research Institute
Avrim Blum, Toyota Technological Institute at Chicago
Cynthia Dwork, Harvard University
Shafi Goldwasser, Simons Institute for the Theory of Computing
Michael Kim, UC Berkeley
Irene Lo, Stanford University
Jamie Morgenstern, University of Washington
Helen Nissenbaum, Cornell Tech
This meeting brought together collaboration members (PIs, postdocs and students) as well as others in the areas of algorithmic fairness, theoretical computer science and other fields with interest in fairness (philosophy, economics, sociology and medical science). We are very happy with the participation (especially when accounting for global conditions) in terms of quantity and intellectual diversity. The mix of talks was designed to expose participants and initiate conversations on three topics:
- The role of theory within the field of algorithmic fairness and within the ethical considerations that arise in
- Recent developments by the
- Application areas, covering new challenges as well as surprising ways in which fairness research has had impact in other
Ethics, fairness and theory of computation
Algorithmic fairness is an emerging research area that is complex, subtle and multidisciplinary in an intrinsic fashion. As such, the more technical, mathematical questions always lie, and need to be discussed, within a more complete context. How justified are our assumptions? Are the questions we are asking the right ones? What are other fields’ perspectives of our questions? While many of these talks alluded to such questions, this was mostly addressed by Helen Nissenbaum. Nissenbaum has been working on issues of fairness in computation for many years. Her background is in philosophy, but she served as a bridge between philosophy and theory of computation (TOC). In her talk, she raised concerns about discrimination by algorithms and especially about the usages of proxies in predictions. In which cases is it legitimate to use variables that correlate with what we want to predict but do not have a causal relation? Who decides?
Recent developments in algorithmic fairness
Avrim Blum discussed exciting results in several natural settings on the interaction between learning, utility maximization, fairness and strategic behavior. Two of the results discussed were (1) while there often is a fairness-accuracy trade-off when the data on which learning relies is biased, then fairness may promote accuracy; (2) when considering a sequential screening process, to maximize utility conditioned on fairness, the process must have memory (and at each step take into account all the previous tests the candidates took, rather than just the fact that the candidate reached this stage).
Shafi Goldwasser discussed new techniques to allow deniable communication in an online world that is governed by a few powerful entities. The notion of deniable encryptions (where the sender of the encrypted message can claim that they sent any other message) is not new. But for deniable encryption to allow more sophisticated operations, it is useful to have fully deniable, fully homomorphic encryption, which is one of the main developments of this research. Goldwasser discussed the relationship between this work and fairness.
Jamie Morgenstern developed learning algorithms in a challenging but highly realistic setup where there are multiple collections of data sources that may be different in the population distribution, the collected attributes and even the rule we are attempting to learn. This led to a lively discussion on the unique fairness and accuracy challenges in this context.
Cynthia Dwork introduced a new notion of scaffolding sets, which is a relatively small collection of sets that are particularly suited to be used in multicalibration. Multicalibration has been introduced by PIs and other members of the collaboration as a novel notion of algorithmic fairness. It requires that a predictor be calibrated (accurate in some exact sense) on a large collection of subpopulations. The results introduced by Dwork give a method to select the family of population in a way that the sets will be a small collection (for computational efficiency considerations) and the outcome be globally accurate.
Noam Barda and Noa Dagan discussed deriving clinical information that is relevant to subpopulation. They separated the discussion to two highly common medical tasks: predictions (e.g., what is the probability of a patent to develop a medical condition in the next five years) and causal inference (what is the expected benefit from a particular medical intervention to a particular patient). The remarkable research discussed concerned huge-scale medical analysis, using the rich data of millions of patients (Clalit Health Services, a large health care organization in Israel). They demonstrated how central predictors in health care are widely incorrect on subpopulations (while correct on average) and showed how to dramatically improve those predictors using the notion of multicalibration from algorithmic fairness discussed above. One of the most remarkable examples given was the creation of a COVID-19–complication predictor in Israel, based only on rough group statistics from China at the early stages of the pandemic using the post-processing multicalibration algorithm.
Michael Kim described an exciting recent work by members of the collaboration and others suggesting a new paradigm for generalization of statistics from one population to the other. The new paradigm relies on the notion of multicalibration we already discussed here and replaces the prevailing paradigm of learning propensity scores that help translate one population to the other. The new paradigms allow for data analysis that doesn’t need any information on the target population (for which we would like to infer some statistics). This implies a so-called universal adaptability of data analysis and is extremely appealing in situations where the target population is unknown at analysis time.
Irene Lo described her work of increasing equity and diversity in a San Francisco school district, where over the years, schools have become more segregated. This practical application area introduces exciting theoretical problems. For example, in a many-player game where there is a gap in information between the players, what does it mean for a mechanism that takes preferences into account fair? Respecting the preferences of a subpopulation with lower quality information may still lead to subpar outcomes for that subpopulation.
Thursday, February 3
9:30 AM Avrim Blum | On Learning in the Presence of Biased Data and Strategic Behavior 11:00 AM Jamie Morgenstern | Learning from Multiple Data Sources 1:00 PM Noam Barda and Noa Dagan | Providing Subgroup-Relevant Clinical Information in Causal Inference and Prediction 2:30 PM Michael Kim | Universal Adaptability 4:00 PM Helen Nissenbaum | TOC and Ethics Together
Friday, February 4
9:30 AM Shafi Goldwasser | The Ability to Deny in an Online World 11:00 AM Irene Lo | Improving Diversity and Equity in San Francisco School Choice 1:00 PM Cynthia Dwork | Scaffolding Sets
Abstracts and Slides
Toyota Technological Institute at Chicago
On Learning in the Presence of Biased Data and Strategic Behavior
View Slides (PDF)
In this talk Blum will discuss two lines of work involving learning in the presence of biased data and strategic behavior. In the first, he will ask whether fairness constraints on learning algorithms can actually improve the accuracy of the classifier produced when training data is unrepresentative or corrupted due to bias. Typically, fairness constraints are analyzed as a tradeoff with classical objectives such as accuracy. Blum’s results show there are natural scenarios that can be win-win, helping to improve overall accuracy. In the second line of work, he will consider strategic classification: settings where the entities being measured and classified wish to be classified as positive (e.g., college admissions) and will try to modify their observable features if possible to make that happen. Blum will consider this in the online setting, where a particular challenge is that updates made by the learning algorithm will change how the inputs behave as well.
Ben-Gurion University & Clalit Research Institute
Analytic tasks of prediction and causal inference in health care often focus on average effects across entire study populations. These estimates ignore the heterogeneity that exists in different subgroups, eventually resulting in erroneous decisions being made for these subgroups. This talk will focus on the experience relating to improving fairness toward subgroups as experienced in Clalit Health Services, a large health care organization in Israel.
Barda and Dagan will focus first on prediction and describe the application and performance of an algorithm for improving subpopulation calibration in predictive models. The algorithm is meant to address concerns regarding potential unfairness of the models toward groups which are underrepresented in the training data set and thus might receive uncalibrated scores. The algorithm was implemented on widely used risk models, including the ACC/AHA 2013 model for cardiovascular disease and the FRAX model for osteoporotic fractures. This algorithm also played a major role in the development of a COVID-19 mortality risk prediction model at a time when individual-level data of COVID-19 patients was not yet available in Israel. The development process for this predictor will be presented. The resulting predictor was widely used within Clalit Health Services for prevention purposes, with an intention to notify high-risk members of their increased risk for mortality should they get infected, for prioritization of COVID-19 RT-PCR tests and for treatment decisions in confirmed cases.
They will then shift to causality. The effectiveness and safety of SARS-CoV-2 vaccines is a major concern during the COVID-19 pandemic. Evidence generated by randomized clinical trials (RCTs), on which regulatory bodies rely, is usually aimed at mixed populations that are not representative of the composition of the target population for the vaccines. Barda and Dagan will discuss the importance and challenges of applying causal inference methodologies to post-marketing observational data to bridge this gap and generate valuable evidence for subgroups that are not well represented in RCTs such as pregnant women, older individuals and those with multiple comorbidities. This subgroup-relevant clinical information is crucial to helping individuals within these subgroups to make informative decisions.
This talk is based on several joint works with Guy Rothblum, Gal Yona, Uri Shalit, Eitan Bachmat, Marc Lipsitch, Miguel Hernán, Ben Reis and Ran Balicer among others.
Predictors map individual instances in a population to the interval [0, 1]. For a collection C of subsets of a population, a predictor is multicalibrated with respect to C if it is simultaneously calibrated on each set in C. Dwork will initiate the study of the construction of scaffolding sets, a small collection S of sets with the property that multicalibration with respect to S ensures correctness, and not just calibration, of the predictor. Dwork and her collaborators’ approach is inspired by the wisdom that the intermediate layers of a neural net learn a highly structured and useful data representation.
This talk is based on joint work with Maya Burhanpurkar, Zhun Deng and Linjun Zhang.
Simons Institute for the Theory of Computing
The Ability to Deny in an Online World
Plausible deniability seems like the ultimate get-out-of-jail-free card. But how can we make it work when it comes to digital information sent in a public network. Deniable encryption, defined by Canetti et al (Crypto 1997), suggests a method to achieve deniability by the sender of encrypted messages to overcome this problem. The idea is especially interesting in the context of electronic elections to eliminate the threat of vote buying after a vote has been cast, except that their encryption scheme is not homomorphic which is necessary for encrypted vote adding and achieves only a 1/poly probability of being able to successfully deny unless the ciphertexts are not compact.
Goldwasser will present two new works on the subject.
1) With Agarwal and S. Mossel (Crypto21) we define and construct sender Deniable Fully Homomorphic Encryption based on the Learning With Errors (LWE) polynomial hardness assumption. Deniable FHE enables storing encrypted data in the cloud to be processed securely without decryption, maintaining deniability of the encrypted data, as well the prevention of vote-buying in electronic voting schemes where encrypted votes can be tallied without decryption. The construction achieves compact ciphertext independently of the level of deniability- both the size of the public key and the size of the ciphertexts are bounded by a fixed polynomial, independent of the detection probability achieved by the scheme. However ,the offline running time of our encryption algorithm depends on the inverse of the detection probability, thus the scheme falls short of achieving simultaneously compactness, negligible deniability and polynomial encryption time. The online running time is polynomial.
2) With Coladangelo and Vazirani, more recently, we show a sender deniable encryption scheme where the encryption scheme is a quantum algorithm but the ciphertext is classical which is secure under the LWE polynomial hardness assumption. This scheme achieves for the first time simultaneously compactness, negligible deniability and polynomial encryption timeunder LWE. Furthermore, it is possible to extend the scheme so that coercion in an election cannot take place when the coercer is able to dictate all inputs to the deniable encryption algorithm even prior to encryption.
University of California, Berkeley
View Slides (PDF)
A key challenge in modern statistics is to ensure statistically valid inferences across diverse target populations from a fixed source of training data. Statistical techniques that guarantee this type of adaptability not only make statistical conclusions more robust to sampling bias but can also extend the benefits of evidence-based decision-making to communities that do not have the resources to collect high-quality data or computationally intensive estimations on their own. In this talk, Kim will describe a surprising technical connection between the statistical inference problem and multicalibration, a technique developed in the context of algorithmic fairness.
Concretely, the approach derives a correspondence between the fairness goal “to protect subpopulations from miscalibrated predictions” and the statistical goal “to ensure unbiased estimates on downstream target populations.” Exploiting this connection, Kim derives a single-source estimator that provides inferences that are universally adaptable to any downstream target population. The performance of the estimator is comparable to the performance of propensity score reweighting, a widely used technique that explicitly models the underlying source-target shift for every target.
This is based on joint work with Shafi Goldwasser, Christoph Kern, Frauke Kreuter and Omer Reingold.
Improving Diversity and Equity in San Francisco School Choice
More than 65 years after school segregation was ruled unconstitutional, public schools across the United States are resegregating along racial and socioeconomic lines. Many cities have attempted to disentangle school and neighborhood segregation and improve equitable access using policies for city-wide choice. However, these policies have largely not improved patterns of segregation and inequity. From 2018 to 2020, we worked with the San Francisco Unified School District (SFUSD) to design a new policy for student assignment system that meets the district’s goals of diversity, predictability, proximity and equity of access. We used optimization techniques to augment and operationalize district-proposed policy concepts and compared the proposed policy to approaches typically suggested by the school choice literature. Using predictive-choice models developed using historical choice data, we show that the policy can improve on the status quo regarding all district goals, including decreasing the percentage of racial minorities in high-poverty schools from 29 percent to 11 percent. The policy design was enabled by close collaboration with SFUSD and approved in 2020 for use starting the 2024–25 school year.
This talk is based on joint work with Itai Ashlagi, Katherine Mentzer, and Max Allman.
University of Washington
Learning from Multiple Data Sources
View Slides (PDF)
Many real-world ML problems violate the assumptions that (1) training data is generated iid from the same distribution, and (2) that the training and test distributions are the same. For example, many different hospitals in different locations, serving different populations, might collect data about their patients. These datasets would ideally be combined and used to train a model which would perform well on many distributions. In this talk, Morgenstern will describe one approach to formalizing how to use data generated from multiple distributions, and on what test distributions we can guarantee this method will perform well.
This talk is based on joint work with Christopher Jung and Pranjal Awasthi.