CorpusTokenizer#

class montreal_forced_aligner.tokenization.tokenizer.CorpusTokenizer(tokenizer_model_path=None, **kwargs)[source]#

Bases: AcousticCorpusMixin, TopLevelMfaWorker, DictionaryMixin

Top-level worker for generating pronunciations from a corpus and a Pynini tokenizer model

export_files(output_directory)[source]#

Export transcriptions

model_class#

alias of TokenizerModel

setup()[source]#

Set up the pronunciation generator

tokenize_utterances()[source]#

Tokenize utterances

Returns:

Mappings of keys to their tokenized utterances

Return type:

dict[str, list[str]]