Corpus tokenizer#

CorpusTokenizer([tokenizer_model_path])

Top-level worker for generating pronunciations from a corpus and a Pynini tokenizer model

TokenizerValidator([utterances_to_tokenize])

Simple tokenizer#

SimpleTokenizer(word_break_markers, ...[, ...])