First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Benjamin Muller; Yanai Elazar; Benoît Sagot; Djamé Seddah

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah

Abstract Paper Connected Papers Add to Favorites

Interpretability and Analysis of Models for NLP Short paper Paper

Gather-3C: Apr 23, Gather-3C: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Multilingual pretrained language models have demonstrated remarkable zero-shot cross-lingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.

NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021