Simons Collaboration on the Theory of Algorithmic Fairness Annual Meeting 2022

Date & Time


Organizer:
Omer Reingold, Stanford University
 

Meeting Goals:

The first annual meeting of the Simons Collaboration on the Theory of Algorithmic Fairness will bring together as many of the participants of the collaboration as well as others in the areas of algorithmic fairness, theoretical computer-science and in other relevant fields. The meeting will expose the participants to the unique challenges and opportunities of this field, which presents an area of research that is complex, subtle and multidisciplinary in an intrinsic fashion. No model or definition can fully capture such complexities and partial models and solutions carry the risk of legitimizing discrimination and talks will highlight some of these challenges.

The rising importance of this area and the significant attention given to unfairness of algorithms provide opportunities for exceptional societal impact which is rare in theoretical research while the early state of this field offers the prospects of intellectual impact. The meeting will expose participants to some of the ways in which both sorts of impact have started to materialize, and to explore avenues for additional impact.
 

Speakers:

Noam Barda, Ben-Gurion University
Noa Dagan, Ben Gurion University and Clalit Research Institute
Avrim Blum, Toyota Technological Institute at Chicago
Cynthia Dwork, Harvard University
Shafi Goldwasser, Simons Institute for the Theory of Computing
Michael Kim, UC Berkeley
Irene Lo, Stanford University
Jamie Morgenstern, University of Washington
Helen Nissenbaum, Cornell Tech

  • Agendaplus--large

    Thursday, February 3

    9:30 AMAvrim Blum | On Learning in the Presence of Biased Data and Strategic Behavior
    11:00 AMJamie Morgenstern | Learning from Multiple Data Sources
    1:00 PMNoam Barda and Noa Dagan | Providing Subgroup-Relevant Clinical Information in Causal Inference and Prediction
    2:30 PMMichael Kim | Universal Adaptability
    4:00 PMHelen Nissenbaum | TOC and Ethics Together

    Friday, February 4

    9:30 AMShafi Goldwasser | The Ability to Deny in an Online World
    11:00 AMIrene Lo | Improving Diversity and Equity in San Francisco School Choice
    1:00 PMCynthia Dwork | Scaffolding Sets
  • Abstracts and Slidesplus--large

    Avrim Blum
    Toyota Technological Institute at Chicago

    On Learning in the Presence of Biased Data and Strategic Behavior
    View Slides (PDF)

    In this talk Blum will discuss two lines of work involving learning in the presence of biased data and strategic behavior. In the first, he will ask whether fairness constraints on learning algorithms can actually improve the accuracy of the classifier produced when training data is unrepresentative or corrupted due to bias. Typically, fairness constraints are analyzed as a tradeoff with classical objectives such as accuracy. Blum’s results show there are natural scenarios that can be win-win, helping to improve overall accuracy. In the second line of work, he will consider strategic classification: settings where the entities being measured and classified wish to be classified as positive (e.g., college admissions) and will try to modify their observable features if possible to make that happen. Blum will consider this in the online setting, where a particular challenge is that updates made by the learning algorithm will change how the inputs behave as well.
     

    Noam Barda
    Ben-Gurion University
    and
    Noa Dagan
    Ben-Gurion University & Clalit Research Institute

    Providing Subgroup-Relevant Clinical Information in Causal Inference and Prediction
    View Barda Slides (PDF)
    View Dagan Slides (PDF)

    Analytic tasks of prediction and causal inference in health care often focus on average effects across entire study populations. These estimates ignore the heterogeneity that exists in different subgroups, eventually resulting in erroneous decisions being made for these subgroups. This talk will focus on the experience relating to improving fairness toward subgroups as experienced in Clalit Health Services, a large health care organization in Israel.

    Barda and Dagan will focus first on prediction and describe the application and performance of an algorithm for improving subpopulation calibration in predictive models. The algorithm is meant to address concerns regarding potential unfairness of the models toward groups which are underrepresented in the training data set and thus might receive uncalibrated scores. The algorithm was implemented on widely used risk models, including the ACC/AHA 2013 model for cardiovascular disease and the FRAX model for osteoporotic fractures. This algorithm also played a major role in the development of a COVID-19 mortality risk prediction model at a time when individual-level data of COVID-19 patients was not yet available in Israel. The development process for this predictor will be presented. The resulting predictor was widely used within Clalit Health Services for prevention purposes, with an intention to notify high-risk members of their increased risk for mortality should they get infected, for prioritization of COVID-19 RT-PCR tests and for treatment decisions in confirmed cases.

    They will then shift to causality. The effectiveness and safety of SARS-CoV-2 vaccines is a major concern during the COVID-19 pandemic. Evidence generated by randomized clinical trials (RCTs), on which regulatory bodies rely, is usually aimed at mixed populations that are not representative of the composition of the target population for the vaccines. Barda and Dagan will discuss the importance and challenges of applying causal inference methodologies to post-marketing observational data to bridge this gap and generate valuable evidence for subgroups that are not well represented in RCTs such as pregnant women, older individuals and those with multiple comorbidities. This subgroup-relevant clinical information is crucial to helping individuals within these subgroups to make informative decisions.

    This talk is based on several joint works with Guy Rothblum, Gal Yona, Uri Shalit, Eitan Bachmat, Marc Lipsitch, Miguel Hernán, Ben Reis and Ran Balicer among others.
     

    Cynthia Dwork
    Harvard University

    Scaffolding Sets

    Predictors map individual instances in a population to the interval [0, 1]. For a collection C of subsets of a population, a predictor is multicalibrated with respect to C if it is simultaneously calibrated on each set in C. Dwork will initiate the study of the construction of scaffolding sets, a small collection S of sets with the property that multicalibration with respect to S ensures correctness, and not just calibration, of the predictor. Dwork and her collaborators’ approach is inspired by the wisdom that the intermediate layers of a neural net learn a highly structured and useful data representation.

    This talk is based on joint work with Maya Burhanpurkar, Zhun Deng and Linjun Zhang.
     

    Shafi Goldwasser
    Simons Institute for the Theory of Computing

    The Ability to Deny in an Online World

    Plausible deniability seems like the ultimate get-out-of-jail-free card. But how can we make it work when it comes to digital information sent in a public network. Deniable encryption, defined by Canetti et al (Crypto 1997), suggests a method to achieve deniability by the sender of encrypted messages to overcome this problem. The idea is especially interesting in the context of electronic elections to eliminate the threat of vote buying after a vote has been cast, except that their encryption scheme is not homomorphic which is necessary for encrypted vote adding and achieves only a 1/poly probability of being able to successfully deny unless the ciphertexts are not compact.

    Goldwasser will present two new works on the subject.

    1) With Agarwal and S. Mossel (Crypto21) we define and construct sender Deniable Fully Homomorphic Encryption based on the Learning With Errors (LWE) polynomial hardness assumption. Deniable FHE enables storing encrypted data in the cloud to be processed securely without decryption, maintaining deniability of the encrypted data, as well the prevention of vote-buying in electronic voting schemes where encrypted votes can be tallied without decryption. The construction achieves compact ciphertext independently of the level of deniability- both the size of the public key and the size of the ciphertexts are bounded by a fixed polynomial, independent of the detection probability achieved by the scheme. However ,the offline running time of our encryption algorithm depends on the inverse of the detection probability, thus the scheme falls short of achieving simultaneously compactness, negligible deniability and polynomial encryption time. The online running time is polynomial.

    2) With Coladangelo and Vazirani, more recently, we show a sender deniable encryption scheme where the encryption scheme is a quantum algorithm but the ciphertext is classical which is secure under the LWE polynomial hardness assumption. This scheme achieves for the first time simultaneously compactness, negligible deniability and polynomial encryption timeunder LWE. Furthermore, it is possible to extend the scheme so that coercion in an election cannot take place when the coercer is able to dictate all inputs to the deniable encryption algorithm even prior to encryption.
     

    Michael Kim
    University of California, Berkeley

    Universal Adaptability
    View Slides (PDF)

    A key challenge in modern statistics is to ensure statistically valid inferences across diverse target populations from a fixed source of training data. Statistical techniques that guarantee this type of adaptability not only make statistical conclusions more robust to sampling bias but can also extend the benefits of evidence-based decision-making to communities that do not have the resources to collect high-quality data or computationally intensive estimations on their own. In this talk, Kim will describe a surprising technical connection between the statistical inference problem and multicalibration, a technique developed in the context of algorithmic fairness.

    Concretely, the approach derives a correspondence between the fairness goal “to protect subpopulations from miscalibrated predictions” and the statistical goal “to ensure unbiased estimates on downstream target populations.” Exploiting this connection, Kim derives a single-source estimator that provides inferences that are universally adaptable to any downstream target population. The performance of the estimator is comparable to the performance of propensity score reweighting, a widely used technique that explicitly models the underlying source-target shift for every target.

    This is based on joint work with Shafi Goldwasser, Christoph Kern, Frauke Kreuter and Omer Reingold.
     

    Irene Lo
    Stanford University

    Improving Diversity and Equity in San Francisco School Choice

    More than 65 years after school segregation was ruled unconstitutional, public schools across the United States are resegregating along racial and socioeconomic lines. Many cities have attempted to disentangle school and neighborhood segregation and improve equitable access using policies for city-wide choice. However, these policies have largely not improved patterns of segregation and inequity. From 2018 to 2020, we worked with the San Francisco Unified School District (SFUSD) to design a new policy for student assignment system that meets the district’s goals of diversity, predictability, proximity and equity of access. We used optimization techniques to augment and operationalize district-proposed policy concepts and compared the proposed policy to approaches typically suggested by the school choice literature. Using predictive-choice models developed using historical choice data, we show that the policy can improve on the status quo regarding all district goals, including decreasing the percentage of racial minorities in high-poverty schools from 29 percent to 11 percent. The policy design was enabled by close collaboration with SFUSD and approved in 2020 for use starting the 2024–25 school year.

    This talk is based on joint work with Itai Ashlagi, Katherine Mentzer, and Max Allman.
     

    Jamie Morgenstern
    University of Washington

    Learning from Multiple Data Sources
    View Slides (PDF)

    Many real-world ML problems violate the assumptions that (1) training data is generated iid from the same distribution, and (2) that the training and test distributions are the same. For example, many different hospitals in different locations, serving different populations, might collect data about their patients. These datasets would ideally be combined and used to train a model which would perform well on many distributions. In this talk, Morgenstern will describe one approach to formalizing how to use data generated from multiple distributions, and on what test distributions we can guarantee this method will perform well.

    This talk is based on joint work with Christopher Jung and Pranjal Awasthi.

Subscribe to MPS announcements and other foundation updates