Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings

Christos Xypolopoulos, Antoine Tixier, Michalis Vazirgiannis

Semantics: Lexical Semantics Long paper Paper

Zoom-4C: Apr 22, Zoom-4C: Apr 22 (08:00-09:00 UTC) [Join Zoom Meeting]
Gather-3F: Apr 23, Gather-3F: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. Through rigorous experiments, we show that our rankings are well correlated, with strong statistical significance, with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia, etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings and make interesting observations. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our approach makes it applicable to any language. Code and data are publicly available https://github.com/ksipos/polysemy-assessment .
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers