PunKtuator: A Multilingual Punctuation Restoration System for Spoken and Written Text

Varnith Chordia

Demo Paper

Gather-2E: Apr 22, Gather-2E: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Text transcripts without punctuation or sentence boundaries are hard to comprehend for both humans and machines. Punctuation marks play a vital role by providing meaning to the sentence and incorrect use or placement of punctuation marks can often alter it. This can impact downstream tasks such as language translation and understanding, pronoun resolution, text summarization, etc. for humans and machines. An automated punctuation restoration (APR) system with minimal human intervention can improve comprehension of text and help users write better. In this paper we describe a multitask modeling approach as a system to restore punctuation in multiple high resource -- Germanic (English and German), Romanic (French)-- and low resource languages -- Indo-Aryan (Hindi) Dravidian (Tamil) -- that does not require extensive knowledge of grammar or syntax of a given language for both spoken and written form of text. For German language and the given Indic based languages this is the first towards restoring punctuation and can serve as a baseline for future work.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Similar Papers

Disfluency Correction using Unsupervised and Semi-supervised Learning
Nikhil Saini, Drumil Trivedi, Shreya Khare, Tejas Dhamecha, Preethi Jyothi, Samarth Bharadwaj, Pushpak Bhattacharyya,
Subword Pooling Makes a Difference
Judit Ács, Ákos Kádár, Andras Kornai,
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán,