The Role of Syntactic Planning in Compositional Image Captioning

Emanuele Bugliarello, Desmond Elliott

Language Grounding to Vision, Robotics and Beyond Long paper Paper

Gather-3F: Apr 23, Gather-3F: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images. Recently, Nikolaus et al. (2019) introduced a dataset to assess compositional generalization in image captioning, where models are evaluated on their ability to describe images with unseen adjective–noun and noun–verb compositions. In this work, we investigate different methods to improve compositional generalization by planning the syntactic structure of a caption. Our experiments show that jointly modeling tokens and syntactic tags enhances generalization in both RNN- and Transformer-based models, while also improving performance on standard metrics.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto,
Top-down Discourse Parsing via Sequence Labelling
Fajri Koto, Jey Han Lau, Timothy Baldwin,
On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin,
Few-Shot Semantic Parsing for New Predicates
Zhuang Li, Lizhen Qu, shuo huang, Gholamreza Haffari,