Segment transcribed files (mfa segment)
#
The Montreal Forced Aligner can use Voice Activity Detection (VAD) capabilities from SpeechBrain to generate segments from a longer sound file, while attempting to segment transcripts as well. If you do not have transcripts, see Segment untranscribed files (mfa segment_vad).
Note
On Windows, if you get an OSError/WinError 1314
during the run, follow these instructions to enable symbolic link creation permissions.
Command reference#
mfa segment#
Create segments based on SpeechBrain’s voice activity detection (VAD) model or a basic energy-based algorithm
mfa segment [OPTIONS] CORPUS_DIRECTORY DICTIONARY_PATH ACOUSTIC_MODEL_PATH
OUTPUT_DIRECTORY
Options
- -c, --config_path <config_path>#
Path to config file to use for training.
- --output_format <output_format>#
Format for aligned output files (default is long_textgrid).
- Options:
long_textgrid | short_textgrid | json | csv
- --speechbrain, --no_speechbrain#
Flag for using SpeechBrain’s pretrained VAD model
- --cuda, --no_cuda#
Flag for using CUDA for SpeechBrain’s model
- -p, --profile <profile>#
Configuration profile to use, defaults to “global”
- -t, --temporary_directory <temporary_directory>#
Set the default temporary directory, default is /home/docs/Documents/MFA
- -j, --num_jobs <num_jobs>#
Set the number of processes to use by default, defaults to 3
- --clean, --no_clean#
Remove files from previous runs, default is False
- -v, --verbose, -nv, --no_verbose#
Output debug messages, default is False
- -q, --quiet, -nq, --no_quiet#
Suppress all output messages (overrides verbose), default is False
- --overwrite, --no_overwrite#
Overwrite output files when they exist, default is False
- --use_mp, --no_use_mp#
Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.
- --use_threading, --no_use_threading#
Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.
- -d, --debug, -nd, --no_debug#
Run extra steps for debugging issues, default is False
- --use_postgres, --no_use_postgres#
Use postgres instead of sqlite for extra functionality, default is False
- --single_speaker#
Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to
--uses_speaker_adaptation false
.
- --textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#
Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.
- -h, --help#
Show this message and exit.
Arguments
- CORPUS_DIRECTORY#
Required argument
- DICTIONARY_PATH#
Required argument
- ACOUSTIC_MODEL_PATH#
Required argument
- OUTPUT_DIRECTORY#
Required argument
Configuration reference#
API reference#
Segment untranscribed files (mfa segment_vad)
#
The Montreal Forced Aligner can use Voice Activity Detection (VAD) capabilities from SpeechBrain or energy based VAD to generate segments from a longer sound file. This command does not split transcripts, instead assigning a default label of “speech” to all identified speech segments. If you would like to preserve transcripts for each segment, see Segment transcribed files (mfa segment).
Note
On Windows, if you get an OSError/WinError 1314
during the run, follow these instructions to enable symbolic link creation permissions.
Command reference#
mfa segment_vad#
Create segments based on SpeechBrain’s voice activity detection (VAD) model or a basic energy-based algorithm
mfa segment_vad [OPTIONS] CORPUS_DIRECTORY OUTPUT_DIRECTORY
Options
- -c, --config_path <config_path>#
Path to config file to use for training.
- --output_format <output_format>#
Format for aligned output files (default is long_textgrid).
- Options:
long_textgrid | short_textgrid | json | csv
- --speechbrain, --no_speechbrain#
Flag for using SpeechBrain’s pretrained VAD model
- --cuda, --no_cuda#
Flag for using CUDA for SpeechBrain’s model
- --segment_transcripts, --no_segment_transcripts#
Flag for using CUDA for SpeechBrain’s model
- -p, --profile <profile>#
Configuration profile to use, defaults to “global”
- -t, --temporary_directory <temporary_directory>#
Set the default temporary directory, default is /home/docs/Documents/MFA
- -j, --num_jobs <num_jobs>#
Set the number of processes to use by default, defaults to 3
- --clean, --no_clean#
Remove files from previous runs, default is False
- -v, --verbose, -nv, --no_verbose#
Output debug messages, default is False
- -q, --quiet, -nq, --no_quiet#
Suppress all output messages (overrides verbose), default is False
- --overwrite, --no_overwrite#
Overwrite output files when they exist, default is False
- --use_mp, --no_use_mp#
Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.
- --use_threading, --no_use_threading#
Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.
- -d, --debug, -nd, --no_debug#
Run extra steps for debugging issues, default is False
- --use_postgres, --no_use_postgres#
Use postgres instead of sqlite for extra functionality, default is False
- --single_speaker#
Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to
--uses_speaker_adaptation false
.
- --textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#
Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.
- -h, --help#
Show this message and exit.
Arguments
- CORPUS_DIRECTORY#
Required argument
- OUTPUT_DIRECTORY#
Required argument