Title: Disentangled interpretations for deep learning with ACD
Abstract: Recent deep learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. We will discuss our recent works aiming to interpret neural networks by attributing importance to features and feature interactions for individual predictions. Importantly, the proposed method (named agglomerative contextual decomposition, or ACD) disentangles the importance of features in isolation and the interactions between groups of features. These attributions yield insights across domains, including in NLP/computer vision and can be used to directly improve generalization in interesting ways.
We focus on a problem in cosmology, where it is crucial to interpret how a model trained on simulations predicts fundamental cosmological parameters. By extending ACD to interpret transformations of input features, we vet the model by analyzing attributions in the frequency domain. Finally, we discuss ongoing work using ACD to develop simple transformations (e.g. adaptive wavelets) which can be both predictive and interpretable for cosmological parameter prediction.