Finite-state script normalization and processing utilities: The Nisaba Brahmic library

Cibu Johny, Lawrence Wolf-Sonkin, Alexander Gutkin, Brian Roark

Demo Paper

Gather-1F: Apr 21, Gather-1F: Apr 21 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual normalization, reversible transliteration, and validity checks, implemented in Python within a finite-state transducer formalism. We survey some common Brahmic script issues that may adversely affect the performance of downstream NLP tasks, and provide the rationale for finite-state design and system implementation details.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Similar Papers

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees
Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, Yunhai Tong,
Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari, Fan Bai, Alan Ritter, Gabriel Stanovsky,
Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP
Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf, Barbara Plank,