McGill UNIVERSITY – DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE
There is enormous linguistic diversity within and across language families. But all languages must be efficient for their speakers’ needs and cognitively tractable for processing. Using ideas and techniques from computer science, we can generate hypotheses about what efficient languages should look like. Using large amounts of multilingual linguistic data, computational modeling, and online behavioral experiments, we can test these hypotheses and therein explain phenomena observed across and within languages. In particular, I will focus on the lexicon and explore why languages have the words they do instead of some other set of words. First, consistent with predictions from Shannon’s information theory, languages are optimized such that the words that convey less information are a) shorter and b) easier to pronounce. For instance, word shortenings like chimpanzee -> chimp are more likely to occur when the context is predictive. Second, across 97 languages, phonotactically probable words are more likely to also have high token frequency. Third, applying these ideas about efficiency to syntax, I show that, across 37 languages, the syntactic distances between dependent words are minimized. I will conclude with a discussion of my work in experimental methods and my directions for future research.