Train a new language model (mfa train_lm)#

MFA has a utility function for training ARPA-format ngram language models, as well as merging with a pre-existing model.

Note

As of version 2.0.6, users on Windows can run this command natively without requiring Windows Subsystem for Linux, see Installation for more details.

Command reference#

mfa train_lm#

Train a language model from a corpus or convert an existing ARPA-format language model to an MFA language model.

mfa train_lm [OPTIONS] SOURCE_PATH OUTPUT_MODEL_PATH

Options

--dictionary_path <dictionary_path>#

Full path to pronunciation dictionary, or saved dictionary name.

-c, --config_path <config_path>#

Path to config file to use for training.

-p, --profile <profile>#

Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#

Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#

Set the number of processes to use by default, defaults to 3

--clean, --no_clean#

Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#

Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#

Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#

Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#

Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#

Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#

Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#

Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#

Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#

Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#

Show this message and exit.

Arguments

SOURCE_PATH#

Required argument

OUTPUT_MODEL_PATH#

Required argument

Configuration reference#

API reference#