McGILL UNIVERSITY DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE
Word embeddings obtained from neural networks trained on big text corpora have become popular representations of word meaning in computational linguistics. In this talk, we first take a look at the different types of semantic relations between two words in a language and ask whether these relations can be identified with the help of popular embedding models such as Word2Vec and GloVe. I propose different measures to obtain the degree of paradigmatic similarity vs. syntagmatic relatedness between two words. In order to evaluate these measures, we use two datasets obtained from experiments on human subjects: SimLex-999 (Hills et al. 2016) including explicitly instructed ratings for word similarity, and explicitly instructed production norms (Jouravlev & McRae, 2016) for word relatedness.
In the second part of the talk, we look into the question of modeling the meaning of discourse connectives. Similarities between a pair of such particles, e.g., “but” and “although”, cannot be computed based directly on surrounding words. I explain however that discourse connectives can also be viewed from a distributional semantics perspective if a suitable abstraction of context is employed. For example, the slightest differences in the meaning of “but” and “although” can be revealed by studying their distribution in a corpus annotated with discourse relations. Finally, I draw some future directions for research based on our findings and the current developments in computational linguistics and natural language processing.