BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression

Ji Xin; Raphael Tang; Yaoliang Yu; Jimmy Lin

BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression

Ji Xin, Raphael Tang, Yaoliang Yu, Jimmy Lin

Abstract Paper Connected Papers Add to Favorites

Green and Sustainable NLP Long paper Paper

Zoom-2D: Apr 21, Zoom-2D: Apr 21 (12:00-13:00 UTC) [Join Zoom Meeting]

Gather-2D: Apr 22, Gather-2D: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at \url{https://github.com/castorini/berxit}.

NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021