At this week’s MCQLL meeting (Wednesday, October 21st, 1:30-2:30pm), Jacob Louis Hoover, a PhD student at McGill and Mila, will present on the connection between grammatical structure and the statistics of word occurrences in language use. Abstract and bio are below.
If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form ASAP to do so.
Bio: Jacob is a PhD student at McGill Linguistics / Mila. He is broadly interested in logic, mathematical linguistics, and the generative / expressive capacity of formal systems, as well as information theory, and examining what both human and machine learning might be able to tell us about the underlying structure of language.
Talk: There is an intuitive connection between grammatical structure and the statistics of word occurrences observed in language use. This intuitive connection is reflected in cognitive models and also in NLP, in the assumption that the patterns of predictability correlate with linguistic structure. We call this the “dependency-dependence” hypothesis. This hypothesis is implicit in the use of language modelling objectives for training modern neural models, and has been made explicitly in some approaches to unsupervised dependency parsing. The strongest version of this hypothesis is to say that compositional structure is in fact entirely reducible to cooccurrence statistics (a hypothesis made explicit in Futrell et al. 2019). Investigating the mutual information of pairs of words using pretrained contextualized embedding models, we show that the optimal structure for prediction is in fact not very closely correlated to the compositional structure. We propose that contextualized mutual information scores of this kind may be useful as a way to understand the structure of predictability, as a system distinct from compositional structure, but also integral to language use.