Globalizing BERT-based Transformer Architectures for Long Document Summarization

quentin grail, Julien PEREZ, Eric Gaussier

Generation and Summarization Long paper Paper

Gather-3B: Apr 23, Gather-3B: Apr 23 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Fine-tuning a large language model on downstream tasks has become a commonly adopted process in the Natural Language Processing (NLP) \cite{wang2018glue}. However, such a process, when associated with the current transformer-based \cite{vaswaniAttentionAllYou2017} architectures, shows several limitations when the target task requires to reason with long documents. In this work, we introduce a novel hierarchical propagation layer that spreads information between multiple transformer windows. We adopt a hierarchical approach where the input is divided in multiple blocks independently processed by the scaled dot-attentions and combined between the successive layers. We validate the effectiveness of our approach on three extractive summarization corpora of long scientific papers and news articles. We compare our approach to standard and pre-trained language-model-based summarizers and report state-of-the-art results for long document summarization and comparable results for smaller document summarization.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

Randomized Deep Structured Prediction for Discourse-Level Processing
Manuel Widmoser, Maria Pacheco, Jean Honorio, Dan Goldwasser,
StructSum: Summarization via Structured Representations
Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov,
Unsupervised Abstractive Summarization of Bengali Text Documents
Radia Rayan Chowdhury, Mir Tafseer Nayeem, Tahsin Tasnim Mim, Md. Saifur Rahman Chowdhury, Taufiqul Jannat,