Generate pronunciations for words (mfa g2p)
#
We have trained several G2P models that are available for download (MFA G2P models).
Warning
Please note that G2P models trained prior to 2.0 cannot be used with MFA 2.0. If you would like to use these models, please use the the 1.0.1 or 1.1 g2p utilities or retrain a new G2P model following Train a new G2P model (mfa train_g2p).
Note
Generating pronunciations to supplement your existing pronunciation
dictionary can be done by running the validation utility (see Running the corpus validation utility), and then use the path
to the oovs_found.txt
file that it generates.
Pronunciation dictionaries can also be generated from the orthographies of the words themselves, rather than relying on a trained G2P model. This functionality should be reserved for languages with transparent orthographies, close to 1-to-1 grapheme-to-phoneme mapping.
See Example 2: Generate Mandarin dictionary for an example of how to use G2P functionality with a premade example.
Note
As of version 2.0.6, users on Windows can run this command natively without requiring Windows Subsystem for Linux, see Installation for more details.
Piping stdin/stdout#
If you specify the input path as -
instead of a file path, the g2p command will run through each line in the stdin and G2P each word with minimal processing. Words will be lower cased and any graphemes that were not in the model’s training data will be removed.
If you specify the output path as -
instead of a file path, the g2p command will send pronunciations as stdout rather than writing to a file.
Note
Using stdin will also bypass database set up (though the database server will still be started and stopped, so be sure to run mfa configure --no_auto_server
if speed is of necessity.
Per-utterance G2P#
The primary use case for G2P is in generating new pronunciation dictionaries, however there is limited support for generating pronunciations over an entire utterance. If the OUTPUT_PATH
specified for mfa g2p
is a directory (i.e., no periods to mark a file extension), then MFA will generate a pronunciation for each word and then concatenate them together and save the resulting transcript in the output directory.
Warning
This method is largely not recommended as the output is only the top hypothesis per word in isolation as MFA does not have access to necessary higher order information, so homographs may often have the wrong pronunciation (i.e., English present tense read [ɹ iː d] vs English past tense read [ɹ ɛ d]). Use at your own risk.
Command reference#
mfa g2p#
Generate a pronunciation dictionary using a G2P model.
mfa g2p [OPTIONS] INPUT_PATH G2P_MODEL_PATH OUTPUT_PATH
Options
- -c, --config_path <config_path>#
Path to config file to use for G2P.
- -n, --num_pronunciations <num_pronunciations>#
Number of pronunciations to generate.
- --dictionary_path <dictionary_path>#
Path to existing pronunciation dictionary to use to find OOVs.
- --include_bracketed#
Included words enclosed by brackets, job_name.e. […], (…), <…>.
- -p, --profile <profile>#
Configuration profile to use, defaults to “global”
- -t, --temporary_directory <temporary_directory>#
Set the default temporary directory, default is /home/docs/Documents/MFA
- -j, --num_jobs <num_jobs>#
Set the number of processes to use by default, defaults to 3
- --clean, --no_clean#
Remove files from previous runs, default is False
- -v, --verbose, -nv, --no_verbose#
Output debug messages, default is False
- -q, --quiet, -nq, --no_quiet#
Suppress all output messages (overrides verbose), default is False
- --overwrite, --no_overwrite#
Overwrite output files when they exist, default is False
- --use_mp, --no_use_mp#
Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.
- --use_threading, --no_use_threading#
Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.
- -d, --debug, -nd, --no_debug#
Run extra steps for debugging issues, default is False
- --use_postgres, --no_use_postgres#
Use postgres instead of sqlite for extra functionality, default is False
- --single_speaker#
Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to
--uses_speaker_adaptation false
.
- --textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#
Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.
- -h, --help#
Show this message and exit.
Arguments
- INPUT_PATH#
Required argument
- G2P_MODEL_PATH#
Required argument
- OUTPUT_PATH#
Required argument