Generate pronunciations for words (mfa g2p)#

We have trained several G2P models that are available for download (MFA G2P models).

Warning

Please note that G2P models trained prior to 2.0 cannot be used with MFA 2.0. If you would like to use these models, please use the the 1.0.1 or 1.1 g2p utilities or retrain a new G2P model following Train a new G2P model (mfa train_g2p).

Note

Generating pronunciations to supplement your existing pronunciation dictionary can be done by running the validation utility (see Running the corpus validation utility), and then use the path to the oovs_found.txt file that it generates.

Pronunciation dictionaries can also be generated from the orthographies of the words themselves, rather than relying on a trained G2P model. This functionality should be reserved for languages with transparent orthographies, close to 1-to-1 grapheme-to-phoneme mapping.

See Example 2: Generate Mandarin dictionary for an example of how to use G2P functionality with a premade example.

Note

As of version 2.0.6, users on Windows can run this command natively without requiring Windows Subsystem for Linux, see Installation for more details.

Piping stdin/stdout#

If you specify the input path as - instead of a file path, the g2p command will run through each line in the stdin and G2P each word with minimal processing. Words will be lower cased and any graphemes that were not in the model’s training data will be removed.

If you specify the output path as - instead of a file path, the g2p command will send pronunciations as stdout rather than writing to a file.

Note

Using stdin will also bypass database set up (though the database server will still be started and stopped, so be sure to run mfa configure --no_auto_server if speed is of necessity.

Per-utterance G2P#

The primary use case for G2P is in generating new pronunciation dictionaries, however there is limited support for generating pronunciations over an entire utterance. If the OUTPUT_PATH specified for mfa g2p is a directory (i.e., no periods to mark a file extension), then MFA will generate a pronunciation for each word and then concatenate them together and save the resulting transcript in the output directory.

Warning

This method is largely not recommended as the output is only the top hypothesis per word in isolation as MFA does not have access to necessary higher order information, so homographs may often have the wrong pronunciation (i.e., English present tense read [ɹ iː d] vs English past tense read [ɹ ɛ d]). Use at your own risk.

Command reference#

mfa g2p#

Generate a pronunciation dictionary using a G2P model.

mfa g2p [OPTIONS] INPUT_PATH G2P_MODEL_PATH OUTPUT_PATH

Options

-c, --config_path <config_path>#

Path to config file to use for G2P.

-n, --num_pronunciations <num_pronunciations>#

Number of pronunciations to generate.

--dictionary_path <dictionary_path>#

Path to existing pronunciation dictionary to use to find OOVs.

--include_bracketed#

Included words enclosed by brackets, job_name.e. […], (…), <…>.

-p, --profile <profile>#

Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#

Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#

Set the number of processes to use by default, defaults to 3

--clean, --no_clean#

Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#

Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#

Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#

Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#

Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#

Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#

Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#

Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#

Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#

Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#

Show this message and exit.

Arguments

INPUT_PATH#

Required argument

G2P_MODEL_PATH#

Required argument

OUTPUT_PATH#

Required argument

Configuration reference#

API reference#