Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Jiaming Luo, Frederik Hartmann, Enrico Santus, Yuan Cao, Regina Barzilay

TACL Track Tacl paper Paper

Zoom-4B: Apr 22, Zoom-4B: Apr 22 (08:00-09:00 UTC) [Join Zoom Meeting]
Gather-3E: Apr 23, Gather-3E: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

From characters to words: the turning point of BPE merges
Ximena Gutierrez-Vasques, Christian Bentz, Olga Sozinova, Tanja Samardzic,
Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules
Tatyana Ruzsics, Olga Sozinova, Ximena Gutierrez-Vasques, Tanja Samardzic,
Does Typological Blinding Impede Cross-Lingual Sharing?
Johannes Bjerva, Isabelle Augenstein,
PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation
Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn,