T3: End-to-End Speech Translation

Jan Niehues, Elizabeth Salesky, Marco Turchi and Matteo Negri

Live Session 1: Apr 19, Live Session 1: Apr 19 (09:30-10:30 UTC) [Join Zoom Meeting]
Live Session 2: Apr 19, Live Session 2: Apr 19 (15:00-16:00 UTC) [Join Zoom Meeting]
Abstract: Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the re- cent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call ‘end-to-end speech translation.’ In this tutorial, we will introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we will give an overview of data sources and model architectures to achieve state-of-the-art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we will discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications.

Time Event Hosts
Apr 19, (09:30-10:30 UTC) Part 1 Jan Niehues, Elizabeth Salesky, Marco Turchi and Matteo Negri
Apr 19, (15:00-16:00 UTC) Part 2
Information about the virtual format of this tutorial: This tutorial has a prerecorded talk on this page (see below) that you can watch anytime during the conference. It also has two live sessions that will be conducted on Zoom and will be livestreamed on this page. Additionally, it has a chat window that you can use to have discussions with the tutorial teachers and other attendees anytime during the conference.