Align with an acoustic model (mfa align)
#
This is the primary workflow of MFA, where you can use pretrained acoustic models to align your dataset. There are a number of MFA acoustic models to use, but you can also adapt a pretrained model to your data (see Adapt acoustic model to new data (mfa adapt)) or train an acoustic model from scratch using your dataset (see Train a new acoustic model (mfa train)).
See also
Evaluating alignments for details on how to evaluate alignments against a gold standard.
Fine-tuning alignments for implementation details on how alignments are fine tuned.
Phone model alignments for implementation details on using phone bigram models for generating alignments.
Analyzing alignment quality for details on the fields generated in the
alignment_analysis.csv
file in the output folder
Command reference#
mfa align#
Align a corpus with a pronunciation dictionary and a pretrained acoustic model.
mfa align [OPTIONS] CORPUS_DIRECTORY DICTIONARY_PATH ACOUSTIC_MODEL_PATH
OUTPUT_DIRECTORY
Options
- -c, --config_path <config_path>#
Path to config file to use for aligning.
- -s, --speaker_characters <speaker_characters>#
Number of characters of file names to use for determining speaker, default is to use directory names.
- -a, --audio_directory <audio_directory>#
Audio directory root to use for finding audio files.
- --reference_directory <reference_directory>#
Directory containing gold standard alignments to evaluate
- --custom_mapping_path <custom_mapping_path>#
YAML file for mapping phones across phone sets in evaluations.
- --output_format <output_format>#
Format for aligned output files (default is long_textgrid).
- Options:
long_textgrid | short_textgrid | json | csv
- --include_original_text#
Flag to include original utterance text in the output.
- --fine_tune#
Flag for running extra fine tuning stage.
- --g2p_model_path <g2p_model_path>#
Path to G2P model to use for OOV items.
- -p, --profile <profile>#
Configuration profile to use, defaults to “global”
- -t, --temporary_directory <temporary_directory>#
Set the default temporary directory, default is /home/docs/Documents/MFA
- -j, --num_jobs <num_jobs>#
Set the number of processes to use by default, defaults to 3
- --clean, --no_clean#
Remove files from previous runs, default is False
- --final_clean, --no_final_clean#
Remove temporary files at the end of run, default is False
- -v, --verbose, -nv, --no_verbose#
Output debug messages, default is False
- -q, --quiet, -nq, --no_quiet#
Suppress all output messages (overrides verbose), default is False
- --overwrite, --no_overwrite#
Overwrite output files when they exist, default is False
- --use_mp, --no_use_mp#
Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.
- --use_threading, --no_use_threading#
Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.
- -d, --debug, -nd, --no_debug#
Run extra steps for debugging issues, default is False
- --use_postgres, --no_use_postgres#
Use postgres instead of sqlite for extra functionality, default is False
- --single_speaker#
Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to
--uses_speaker_adaptation false
.
- --textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#
Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.
- -h, --help#
Show this message and exit.
Arguments
- CORPUS_DIRECTORY#
Required argument
- DICTIONARY_PATH#
Required argument
- ACOUSTIC_MODEL_PATH#
Required argument
- OUTPUT_DIRECTORY#
Required argument
Configuration reference#
By default, the acoustic model controls parameters related to silence probability or speaker adaptation. These can be overridden in the command line so --initial_silence_probability 0.0
will ensure that no utterances start with silence, and --uses_speaker_adaptation false
will skip the feature space adaptation and second pass alignment.
See also
See Speaker adaptation for more details on how speaker adaptation works in Kaldi/MFA.
API reference#
Align a single file (mfa align_one)
#
This workflow is identical to Align with an acoustic model (mfa align), but rather than aligning a full dataset, it only aligns a single file. Because only a single file is used, many of the optimizations for larger datasets are skipped resulting in faster alignment times, but features like speaker adaptation are not employed.
There are a number of MFA acoustic models to use, but you can also adapt a pretrained model to your data (see Adapt acoustic model to new data (mfa adapt)) or train an acoustic model from scratch using your dataset (see Train a new acoustic model (mfa train)).
Command reference#
mfa align_one#
Align a single file with a pronunciation dictionary and a pretrained acoustic model.
mfa align_one [OPTIONS] SOUND_FILE_PATH TEXT_FILE_PATH DICTIONARY_PATH
ACOUSTIC_MODEL_PATH OUTPUT_PATH
Options
- -c, --config_path <config_path>#
Path to config file to use for aligning.
- --output_format <output_format>#
Format for aligned output files (default is long_textgrid).
- Options:
long_textgrid | short_textgrid | json | csv
- --g2p_model_path <g2p_model_path>#
Path to G2P model to use for OOV items.
- -p, --profile <profile>#
Configuration profile to use, defaults to “global”
- -t, --temporary_directory <temporary_directory>#
Set the default temporary directory, default is /home/docs/Documents/MFA
- -j, --num_jobs <num_jobs>#
Set the number of processes to use by default, defaults to 3
- --clean, --no_clean#
Remove files from previous runs, default is False
- --final_clean, --no_final_clean#
Remove temporary files at the end of run, default is False
- -v, --verbose, -nv, --no_verbose#
Output debug messages, default is False
- -q, --quiet, -nq, --no_quiet#
Suppress all output messages (overrides verbose), default is False
- --overwrite, --no_overwrite#
Overwrite output files when they exist, default is False
- --use_mp, --no_use_mp#
Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.
- --use_threading, --no_use_threading#
Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.
- -d, --debug, -nd, --no_debug#
Run extra steps for debugging issues, default is False
- --use_postgres, --no_use_postgres#
Use postgres instead of sqlite for extra functionality, default is False
- --single_speaker#
Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to
--uses_speaker_adaptation false
.
- --textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#
Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.
- -h, --help#
Show this message and exit.
Arguments
- SOUND_FILE_PATH#
Required argument
- TEXT_FILE_PATH#
Required argument
- DICTIONARY_PATH#
Required argument
- ACOUSTIC_MODEL_PATH#
Required argument
- OUTPUT_PATH#
Required argument