On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin
Language Grounding to Vision, Robotics and Beyond Short paper Paper
You can open the pre-recorded video in separate windows.
Abstract:
Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.