Please join us Friday, April 8th at 3:30pm for our next talk of the 2021-2022 McGill Linguistics Colloquium Series. If you are planning to attend talks and have not yet registered, you can do so here (you only need to register once for the 2021-2022 year). After registering, you will receive a confirmation email containing information about joining the meeting.

SpeakerChristopher Potts (Stanford University)

Title: Inducing Interpretable Causal Structures in Neural Networks

Abstract: Early symbolic NLP models were designed to leverage valuable insights about language and cognition. These insights were expressed directly in hand-designed structures, and this ensured that model behaviors were systematic and interpretable. Unfortunately, these models tended also to be brittle and specialized. By contrast, present-day models are data-driven and can flexibly acquire complex behaviors, which has opened up many new avenues. However, the trade-offs are now evident: these models often find opaque, unsystematic solutions. In this talk, I’ll report on our ongoing efforts to combine the best aspects of the old and new using techniques from causal abstraction analysis. In this method, we define high-level causal models, usually in symbolic terms, and then train neural networks to conform to the structure of those models while also learning specific tasks. The central technical piece is interchange intervention training (IIT), in which we swap internal representations in the target neural model in a way that is guided by the input–output behavior of the causal model.  Where the IIT objective is minimized, the high-level model is an interpretable, faithful proxy for the underlying neural model. My talk will focus on how and why IIT works, since I am hoping this will help people identify new application areas for it, and I will also briefly review case studies applying IIT to natural language inference, grounded language understanding, and language model distillation.