Finite-state script normalization and processing utilities: The Nisaba Brahmic library
Cibu Johny, Lawrence Wolf-Sonkin, Alexander Gutkin, Brian Roark
Demo Paper
You can open the pre-recorded video in separate windows.
Abstract:
This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual normalization, reversible transliteration, and validity checks, implemented in Python within a finite-state transducer formalism. We survey some common Brahmic script issues that may adversely affect the performance of downstream NLP tasks, and provide the rationale for finite-state design and system implementation details.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.