Add probabilities to a dictionary (mfa train_dictionary)#

MFA includes a utility command for training pronunciation probabilities of a dictionary given a corpus for alignment.

The implementation used here follow Kaldi’s get_prons.sh, dict_dir_add_pronprobs.sh, and lang/make_lexicon_fst_silprob.py.

See also

Refer to the lexicon FST concept section for an introduction and overview of how MFA compiles pronunciation dictionaries to a WFST. The algorithm and calculations below are based on Chen et al (2015).

Consider the following WFST with two pronunciations of “because” from the trained English US MFA dictionary.

:term:`FST` for two pronunciations of "the" in the English US dictionary

In the above figure, there are are two final states, with 0 corresponding to a word preceded by non-silence and 1 corresponding to a word preceded by silence. The costs associated with each transition are negative log-probabilities, so that less likely paths cost more. The state 0 refers to the beginning of speech, so the paths to the silence and non silence state are equal in this case. The cost for ending on silence is lower at -0.77 than ending on non-silence with a cost of 1.66, meaning that most utterances in the training data had trailing silence at the end of the recordings.

See also

See Probabilistic lexicons for more information on probabilities in lexicons.

Command reference#

mfa train_dictionary#

Calculate pronunciation probabilities for a dictionary based on alignment results in a corpus.

mfa train_dictionary [OPTIONS] CORPUS_DIRECTORY DICTIONARY_PATH
                     ACOUSTIC_MODEL_PATH OUTPUT_DIRECTORY

Options

-c, --config_path <config_path>#

Path to config file to use for training.

--silence_probabilities#

Flag for saving silence information for pronunciations.

-s, --speaker_characters <speaker_characters>#

Number of characters of file names to use for determining speaker, default is to use directory names.

-a, --audio_directory <audio_directory>#

Audio directory root to use for finding audio files.

-p, --profile <profile>#

Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#

Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#

Set the number of processes to use by default, defaults to 3

--clean, --no_clean#

Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#

Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#

Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#

Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#

Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#

Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#

Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#

Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#

Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#

Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#

Show this message and exit.

Arguments

CORPUS_DIRECTORY#

Required argument

DICTIONARY_PATH#

Required argument

ACOUSTIC_MODEL_PATH#

Required argument

OUTPUT_DIRECTORY#

Required argument