A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Mads Toftrup; Søren Asger Sørensen; Manuel Ciosici; Ira Assent

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

Mads Toftrup, Søren Asger Sørensen, Manuel Ciosici, Ira Assent

Abstract Paper Connected Papers Add to Favorites

Student Research Workshop Long paper Paper

Gather-2F: Apr 22, Gather-2F: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

Abstract: Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

Connected Papers in EACL2021