The next talk in our 2025-2026 McGill Linguistics Colloquium Series will be given by Dr. Márton Sóskuthy (The University of British Columbia) next Friday, November 7th at 3:30pm at Leacock 232. The details of the talk are given below.
Title: Sound change and lexical shifts in the emergence of Zipf’s Law of Abbreviation in English
Abstract: Zipf’s Law of Abbreviation is a famous statistical pattern whereby frequent words tend to be shorter than infrequent ones. Large-scale studies have found this pattern both in multi-billion word collections of texts from well-resourced languages (e.g. Piantadosi et al. 2011), as well as smaller collections of texts from close to a thousand languages (Bentz et al. 2016). Yet there are still glaring gaps in our knowledge of the fine details and origins of this pattern. This talk attempts to fill these gaps through a careful investigation of the pattern in historical data and corpora from English supplemented by statistical and computational modelling.
The first case study looks at changes between Old English (OE) and Modern English (ME) to see whether we find changes to the shapes of words in line with Zipf’s Law. In order to do this, a grapheme-to-phoneme (G2P) model is trained and used to generate expected ME pronunciations for 1,500 OE spellings. These are compared with their actual ME pronunciations in terms of their length. Infrequent words turn out to be somewhat longer on average than expected, and frequent words somewhat shorter. However, the correlation is relatively weak, and much of it is carried by items of very high frequency.
The second case study looks at how referring forms shift in response to changes in the frequency of the underlying concept in the COHA historical corpus (e.g. plane overtaking airplane as the frequency of the concept increases). This resource-intensive study uses LLMs to disambiguate words in context (e.g. plane ‘surface’ vs. plane ‘flying vehicle’) and a Monte Carlo resampling method to find non-trivial correlations between time series. A key finding is that the length of referring forms adapts to the frequency of the concept when that frequency increases, but does not show changes otherwise.
Finally, I present simulation results that show how Zipf’s Law of Abbreviation may emerge in a simple model of communication even in the absence of any explicit pressure for short forms to be prioritised in frequent contexts.
