Align with an acoustic model `(mfa align)`#

This is the primary workflow of MFA, where you can use pretrained acoustic models to align your dataset. There are a number of MFA acoustic models to use, but you can also adapt a pretrained model to your data (see Adapt acoustic model to new data (mfa adapt)) or train an acoustic model from scratch using your dataset (see Train a new acoustic model (mfa train)).

Command reference#

mfa align#

Align a corpus with a pronunciation dictionary and a pretrained acoustic model.

mfa align [OPTIONS] CORPUS_DIRECTORY DICTIONARY_PATH ACOUSTIC_MODEL_PATH
          OUTPUT_DIRECTORY

Options

-c, --config_path <config_path>#: Path to config file to use for aligning.

-s, --speaker_characters <speaker_characters>#: Number of characters of file names to use for determining speaker, default is to use directory names.

-a, --audio_directory <audio_directory>#: Audio directory root to use for finding audio files.

--reference_directory <reference_directory>#: Directory containing gold standard alignments to evaluate

--custom_mapping_path <custom_mapping_path>#: YAML file for mapping phones across phone sets in evaluations.

--output_format <output_format>#

Format for aligned output files (default is long_textgrid).

Options:: long_textgrid | short_textgrid | json | csv

--include_original_text#: Flag to include original utterance text in the output.

--fine_tune#: Flag for running extra fine tuning stage.

--g2p_model_path <g2p_model_path>#: Path to G2P model to use for OOV items.

-p, --profile <profile>#: Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#: Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#: Set the number of processes to use by default, defaults to 3

--clean, --no_clean#: Remove files from previous runs, default is False

--final_clean, --no_final_clean#: Remove temporary files at the end of run, default is False

-v, --verbose, -nv, --no_verbose#: Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#: Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#: Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#: Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#: Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#: Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#: Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#: Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#: Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#: Show this message and exit.

Arguments

CORPUS_DIRECTORY#: Required argument

DICTIONARY_PATH#: Required argument

ACOUSTIC_MODEL_PATH#: Required argument

OUTPUT_DIRECTORY#: Required argument

Configuration reference#

By default, the acoustic model controls parameters related to silence probability or speaker adaptation. These can be overridden in the command line so --initial_silence_probability 0.0 will ensure that no utterances start with silence, and --uses_speaker_adaptation false will skip the feature space adaptation and second pass alignment.

API reference#

Alignment

Align a single file `(mfa align_one)`#

This workflow is identical to Align with an acoustic model (mfa align), but rather than aligning a full dataset, it only aligns a single file. Because only a single file is used, many of the optimizations for larger datasets are skipped resulting in faster alignment times, but features like speaker adaptation are not employed.

There are a number of MFA acoustic models to use, but you can also adapt a pretrained model to your data (see Adapt acoustic model to new data (mfa adapt)) or train an acoustic model from scratch using your dataset (see Train a new acoustic model (mfa train)).

Command reference#

mfa align_one#

Align a single file with a pronunciation dictionary and a pretrained acoustic model.

mfa align_one [OPTIONS] SOUND_FILE_PATH TEXT_FILE_PATH DICTIONARY_PATH
              ACOUSTIC_MODEL_PATH OUTPUT_PATH

Options

-c, --config_path <config_path>#: Path to config file to use for aligning.

--output_format <output_format>#

Format for aligned output files (default is long_textgrid).

Options:: long_textgrid | short_textgrid | json | csv

--g2p_model_path <g2p_model_path>#: Path to G2P model to use for OOV items.

-p, --profile <profile>#: Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#: Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#: Set the number of processes to use by default, defaults to 3

--clean, --no_clean#: Remove files from previous runs, default is False

--final_clean, --no_final_clean#: Remove temporary files at the end of run, default is False

-v, --verbose, -nv, --no_verbose#: Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#: Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#: Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#: Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#: Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#: Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#: Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#: Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#: Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#: Show this message and exit.

Arguments

SOUND_FILE_PATH#: Required argument

TEXT_FILE_PATH#: Required argument

DICTIONARY_PATH#: Required argument

ACOUSTIC_MODEL_PATH#: Required argument

OUTPUT_PATH#: Required argument

Configuration reference#

Global Options

Align with an acoustic model (mfa align)#

Command reference#

mfa align#

Configuration reference#

API reference#

Align a single file (mfa align_one)#

Command reference#

mfa align_one#

Configuration reference#

Align with an acoustic model `(mfa align)`#

Align a single file `(mfa align_one)`#