Generate a new pronunciation dictionary `(mfa g2p)`#

We have trained several G2P models that are available for download (MFA G2P models).

Warning

Please note that G2P models trained prior to 2.0 cannot be used with MFA 2.0. If you would like to use these models, please use the the 1.0.1 or 1.1 g2p utilities or retrain a new G2P model following Train a new G2P model (mfa train_g2p).

Note

Generating pronunciations to supplement your existing pronunciation dictionary can be done by running the validation utility (see Running the corpus validation utility), and then use the path to the oovs_found.txt file that it generates.

Pronunciation dictionaries can also be generated from the orthographies of the words themselves, rather than relying on a trained G2P model. This functionality should be reserved for languages with transparent orthographies, close to 1-to-1 grapheme-to-phoneme mapping.

See Example 2: Generate Mandarin dictionary for an example of how to use G2P functionality with a premade example.

Note

As of version 2.0.6, users on Windows can run this command natively without requiring Windows Subsystem for Linux, see Installation Guide for more details.

Command reference#

mfa g2p#

Generate a pronunciation dictionary using a G2P model.

mfa g2p [OPTIONS] INPUT_PATH G2P_MODEL_PATH OUTPUT_PATH

Options

-c, --config_path <config_path>#: Path to config file to use for training.

--include_bracketed#: Included words enclosed by brackets, job_name.e. […], (…), <…>.

-p, --profile <profile>#: Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#: Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#: Set the number of processes to use by default, defaults to 3

--clean, --no_clean#: Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#: Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#: Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#: Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#: Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#: Run extra steps for debugging issues, default is False

--single_speaker#: Single speaker mode creates multiprocessing splits based on utterances rather than speakers.

--textgrid_cleanup, --no_textgrid_cleanup#: Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#: Show this message and exit.

Arguments

INPUT_PATH#: Required argument

G2P_MODEL_PATH#: Required argument

OUTPUT_PATH#: Required argument

Configuration reference#

API reference#

Generating dictionaries

Generate a new pronunciation dictionary (mfa g2p)#