At this week’s MCQLL meeting, Ben LeBrun will be presenting “Incremental and Systematic Visually-Grounded Language Understanding using Modular Symbolic Representations”. Abstract below.
We will be meeting this Tuesday October 17th at 3:00PM. Meetings are held both in person in room 117 of the McGill Linguistics department and on zoom.
Abstract: Humans relate language to the external world systematically and incrementally, inferringsystematic mappings between visual and linguistic input on word-by-word basis (Tanenhaus et al. 1995, Eberhard et al. 1995). In contrast, existing models of visually-grounded language understanding have no notion of incrementality, and often fail to behave systematically (Conwell & Ullman 2022, Ruis et al. 2020). In this talk, I will show that incrementality and systematicity are made possible by a model in which visual and linguistic input are tied together via modular symbolic representations of linguistic meaning and 3D visual scenes. I will present models which can (a) generate 3D visual scenes from natural language more systematically than existing approaches and (b) incrementally ground natural language in 3D visual scenes as precisely as humans.