This week’s MCQLL meeting Thursday, March 25, 1:30-2:30pm, will feature a talk from Emily Goodwin. Talk abstract is below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract: Recent attention in neural natural language understanding models has focused on generalization that is compositional (the meanings of larger expressions are a function of the meanings of smaller expressions) and systematic (individual words mean the same thing when put in novel combinations). Datasets for compositional and systematic generalization often focus on testing classes of syntactic constructions (testing only on strings of a certain length or longer, or novel combinations of particular predicates). In contrast, the compositional freebase queries (CFQ) training and test sets are automatically sampled. To measure the compositional challenge of a test set relative to its training set, they measure the divergence between the distribution of syntactic compounds in test and train. Training and test splits with maximum compound divergence (MCD) are highly challenging for semantic parsers, but (unlike other datasets designed to test compositional generalization) the splits do not specifically hold-out human-recognizable classes of syntactic constructions from the training set.In this talk I will present preliminary results of a syntactic analyses of the MCD splits released in the CFQ dataset, and explore whether model failures on MCD splits can be explained in terms of phenomena familiar to syntactic theory.