Does She Wink or Does She Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models

Lutfi Kerem Senel, Hinrich Schütze

Semantics: Lexical Semantics Short paper Paper

Gather-1D: Apr 21, Gather-1D: Apr 21 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Recent progress in pretraining language models on large corpora has resulted in significant performance gains on many NLP tasks. These large models acquire linguistic knowledge during pretraining, which helps to improve performance on downstream tasks via fine-tuning. To assess what kind of knowledge is acquired, language models are commonly probed by querying them with `fill in the blank' style cloze questions. Existing probing datasets mainly focus on knowledge about relations between words and entities. We introduce WDLMPro (Word Definitions Language Model Probing) to evaluate word understanding directly using dictionary definitions of words. In our experiments, three popular pretrained language models struggle to match words and their definitions. This indicates that they understand many words poorly and that our new probing task is a difficult challenge that could help guide research on LMs in the future.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
Abhilasha Ravichander, Yonatan Belinkov, Eduard Hovy,
Meta-Learning for Effective Multi-task and Multilingual Modelling
Ishan Tarunesh, Sushil Khyalia, vishwajeet kumar, Ganesh Ramakrishnan, Preethi Jyothi,
Which is Better for Deep Learning: Python or MATLAB? Answering Comparative Questions in Natural Language
Viktoriia Chekalina, Alexander Bondarenko, Chris Biemann, Meriem Beloucif, Varvara Logacheva, Alexander Panchenko,