Evaluating language models for the retrieval and categorization of lexical collocations

Luis Espinosa Anke; Joan Codina-Filba; Leo Wanner

Evaluating language models for the retrieval and categorization of lexical collocations

Luis Espinosa Anke, Joan Codina-Filba, Leo Wanner

Abstract Paper Connected Papers Add to Favorites

Semantics: Lexical Semantics Long paper Paper

Zoom-4C: Apr 22, Zoom-4C: Apr 22 (08:00-09:00 UTC) [Join Zoom Meeting]

Gather-3F: Apr 23, Gather-3F: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Lexical collocations are idiosyncratic combinations of two syntactically bound lexical items (e.g., “heavy rain” or “take a step”). Understanding their degree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for language learners, lexicographers and downstream NLP applications. In this paper, we perform an exhaustive analysis of current language models for collocation understanding. We first construct a dataset of apparitions of lexical collocations in context, categorized into 17 representative semantic categories. Then, we perform two experiments: (1) unsupervised collocate retrieval using BERT, and (2) supervised collocation classification in context. We find that most models perform well in distinguishing light verb constructions, especially if the collocation’s first argument acts as subject, but often fail to distinguish, first, different syntactic structures within the same semantic category, and second, fine-grained semantic categories which restrict the use of small sets of valid collocates for a given base.

NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021