This week’s MCQLL meeting, taking place Thursday, Feb 25th, 1:30-2:30pm will feature a talk entitled “Information-theoretic models of natural language” by Professor Richard Futrell. Abstract and bio are below. If you would like to join the meeting and have not yet registered for this semester’s MCQLL meetings, please send an email to requesting the link.

Abstract: I claim that human languages can be modeled as information-theoretic codes, that is, systems that maximize information transfer under certain constraints. I argue that the relevant constraints for human language are those involving the cognitive resources used during language production and comprehension. Viewing human language in this way, it is possible to derive and test new quantitative predictions about the statistical, syntactic, and morphemic structure of human languages.

I start by reviewing some of the many ways that natural languages differ from optimal codes as studied in information theory. I argue that one distinguishing characteristic of human languages, as opposed to other natural and artificial codes, is a property I call “information locality”: information about particular aspects of meaning is localized in time within a linguistic utterance. I give evidence for information locality at multiple levels of linguistic structure, including the structure of words and the order of words in sentences.

Next, I state a theorem showing that information locality is a property of any communication system where the encoder and/or decoder are operating incrementally under memory constraints. The theorem yields a new, fully formal, and quantifiable definition of information locality, which leads to new predictions about word order and the structure of words across languages. I test these predictions in broad corpus studies of word order in over 50 languages, and in case studies of the order of morphemes within words in two languages.

Bio: Richard Futrell is an Assistant Professor of Language Science at the University of California, Irvine. His research applies information theory to better understand human language and how humans and machines can learn and process it.