Kyle Gorman from Google AI and CUNY will be visiting the Department the week of November 12th. He will be giving a talk at 15:30 – 17:00 on Monday in Room 117 1085 Dr. Penfield (title and abstract will be sent out soon), and a Tutorial on Pynini, a Python library he developed for weighted finite-state grammar compilation, on Wednesday 12:00-15:00 in Ferrier room 230.
Grammar engineering in text-to-speech synthesis
Many speech and language applications, including speech recognition and speech synthesis, require mappings between “written” and “spoken” representations of language. Despite substantial progress in applied machine learning, it is still the case that real-world industrial text-to-speech (TTS) synthesis systems largely depend on language-specific hand-written rules for these conversions. These may require a great deal of development effort and linguistic sophistication, and as such represent substantial barriers for quality control and internationalization.
I first consider the case of number names, where the goal is to map written forms like 328 to three hundred twenty eight. I propose two computational models for learning this mapping. The first uses end-to-end recurrent neural networks. The second, inspired by prior literature on cross-linguistic variation in number naming, uses an induction strategy based on finite-state transducers. While both models achieve near-perform performance, the latter model is trained using several orders of magnitude less data, making it particularly useful for low-resource languages. The latter model is being used at Google to produce number grammars for dozens of languages and locales.
I then consider the case of grapheme-to-phoneme conversion, where the task is to map written words onto their phonemic transcriptions. I describe a model in which the grammar engineering is performed by providing input and output vocabularies; in Spanish for instance, the input vocabulary includes digraphs like ll and rr, which denote single phonemes, and for Japanese kana, the output vocabulary includes entire syllables. This grammatical information, incorporated into a finite-state generative model, results in a significant improvement over a baseline system which lacks direct access to such information.
Pynini: Finite-state grammar development in Python
Finite-state transducers are abstract computational models of relations between sets of strings, widely used in speech and language technologies and studied as computational models of morphophonology. In this tutorial, I will introduce the finite-state transducer formalism and Pynini (Gorman 2016; http://pynini.opengrm.org), a Python library for compiling and processing finitestate grammars. In the first part of the tutorial, we will cover the finite-state formalism in detail. In the second part, we will install the Pynini library and survey its basic functionality. In the third, we will tackle case studies including Finnish vowel harmony rules and decoding ambiguous text messages. Participants are assumed to be familiar with the Python programming language, but I do not assume any experience with finite-state methods or natural language processing. Note to participants: You are encouraged to bring a working laptop. We will reserve some time to install the necessary libraries so that you can follow along and participate in a few select exercises. This software has been tested on Linux, Mac OS X (with an up-to-date version of XCode), and Windows 10 (with the Ubuntu flavor of Windows Subsystem for Linux). In case you wish to get a head start, installation instructions are available here: http://wellformedness.com/courses/PyniniTutorial/installation-instructions.html