TMR: Evaluating NER Recall on Tough Mentions

Jingxuan Tu, Constantine Lignos

Student Research Workshop Long paper Paper

Zoom-5E: Apr 22, Zoom-5E: Apr 22 (12:00-13:00 UTC) [Join Zoom Meeting]
Gather-2F: Apr 22, Gather-2F: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

Abstract: We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of ''tough'' mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data. We demonstrate the usefulness of these metrics by evaluating corpora of English, Spanish, and Dutch using five recent neural architectures. We identify subtle differences between the performance of BERT and Flair on two English NER corpora and identify a weak spot in the performance of current models in Spanish. We conclude that the TMR metrics enable differentiation between otherwise similar-scoring systems and identification of patterns in performance that would go unnoticed from overall precision, recall, and F1.

Connected Papers in EACL2021

Similar Papers

WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context
Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados,
CLiMP: A Benchmark for Chinese Language Model Evaluation
Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, Katharina Kann,
CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata
Manoj Prabhakar Kannan Ravi, Kuldeep Singh, Isaiah Onando Mulang', Saeedeh Shekarpour, Johannes Hoffart, Jens Lehmann,