On the (In)Effectiveness of Images for Text Classification

Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin

Language Grounding to Vision, Robotics and Beyond Short paper Paper

Gather-1E: Apr 21, Gather-1E: Apr 21 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

A Neural Few-Shot Text Classification Reality Check
Thomas Dopierre, Christophe Gravier, Wilfried Logerais,
Meta-Learning for Effective Multi-task and Multilingual Modelling
Ishan Tarunesh, Sushil Khyalia, vishwajeet kumar, Ganesh Ramakrishnan, Preethi Jyothi,
Multilingual and cross-lingual document classification: A meta-learning approach
Niels van der Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova,