Title: Learning Representations Using Causal Invariance
Abstract: Learning algorithms often capture spurious correlations present in the training data distribution instead of addressing the task of interest. Such spurious correlations occur because the data collection process is subject to uncontrolled confounding biases. Suppose however that we have access to a few distinct datasets exemplifying the same concept but whose distributions exhibit different biases. Can we learn something that is common across all these distributions, while ignoring the spurious ways in which they differ? Can we go beyond setting this problem as a multi-criterion optimization problem and learn something that not only works well for our datasets, but also works all distributions that exercises the same biases in possibly more extreme form? One way to achieve this goal consists of projecting the data into a representation space in which training on any of our datasets leads to exactly the same solution. This idea differs in important ways from previous work on statistical robustness or adversarial objectives. Similar to recent work on invariant feature selection, this is about discovering the actual mechanism underlying the data instead of modeling its superficial statistics. The presentation provides some evidence that this can work and also discusses some of the current problems with this approach.