Generating a dictionary¶
We have trained several G2P models that are available for download (Pretrained G2P models).
Please note that G2P models trained prior to 2.0 cannot be used with MFA 2.0. If you would like to use these models, please use the the 1.0.1 or 1.1 g2p utilities or retrain a new G2P model following Training a new G2P model.
To construct a pronunciation dictionary from your .lab or .TextGrid files, simply input:
mfa g2p g2p_model_path input_path output_path
g2p_model_path can either be a fully specified path to a G2P model you’ve trained previously
or one that you’ve downloaded via the
mfa download g2p command (see Pretrained G2P models). The
input_path argument can either be a text file of words to generate transcriptions for or a corpus directory that
will be inspected for text transcripts and a word list will be compiled and pronunciations generated. The
output_path argument is the full path to where the resulting pronunciation dictionary should be saved.
Generating pronunciations to supplement your existing pronunciation
dictionary can be done by running the validation utility (see Running the validation utility), and then use the path
oovs_found.txt file that it generates.
Pronunciation dictionaries can also be generated from the orthographies of the words themselves, rather than relying on a trained G2P model. This functionality should be reserved for languages with transparent orthographies, close to 1-to-1 grapheme-to-phoneme mapping.
mfa g2p input_path output_path
Extra options (see G2P Configuration for full configuration details):
Temporary directory root to use for generating dictionary, default is
Number of jobs to use; defaults to 3, set higher if you have more processors available and would like to generate pronunciations faster
Forces removal of temporary files under
~/Documents/MFAor the specified temporary directory prior to generating the dictionary.
Path to a configuration yaml for G2P generation (see G2P generation configuration file for an example yaml file)
Number of pronunciation variants to generate per word, the default is 1
Flag for whether to generate pronunciations for words that are enclosed in brackets (i.e., […], (…), <…>)
See Example 2: Generate Mandarin dictionary for an example of how to use G2P functionality with a premade example.