Segment transcribed files (mfa segment)#

The Montreal Forced Aligner can use Voice Activity Detection (VAD) capabilities from SpeechBrain to generate segments from a longer sound file, while attempting to segment transcripts as well. If you do not have transcripts, see Segment untranscribed files (mfa segment_vad).

Note

On Windows, if you get an OSError/WinError 1314 during the run, follow these instructions to enable symbolic link creation permissions.

Command reference#

mfa segment#

Create segments based on SpeechBrain’s voice activity detection (VAD) model or a basic energy-based algorithm

mfa segment [OPTIONS] CORPUS_DIRECTORY DICTIONARY_PATH ACOUSTIC_MODEL_PATH
            OUTPUT_DIRECTORY

Options

-c, --config_path <config_path>#

Path to config file to use for training.

--output_format <output_format>#

Format for aligned output files (default is long_textgrid).

Options:

long_textgrid | short_textgrid | json | csv

--speechbrain, --no_speechbrain#

Flag for using SpeechBrain’s pretrained VAD model

--cuda, --no_cuda#

Flag for using CUDA for SpeechBrain’s model

-p, --profile <profile>#

Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#

Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#

Set the number of processes to use by default, defaults to 3

--clean, --no_clean#

Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#

Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#

Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#

Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#

Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#

Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#

Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#

Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#

Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#

Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#

Show this message and exit.

Arguments

CORPUS_DIRECTORY#

Required argument

DICTIONARY_PATH#

Required argument

ACOUSTIC_MODEL_PATH#

Required argument

OUTPUT_DIRECTORY#

Required argument

Configuration reference#

API reference#

Segment untranscribed files (mfa segment_vad)#

The Montreal Forced Aligner can use Voice Activity Detection (VAD) capabilities from SpeechBrain or energy based VAD to generate segments from a longer sound file. This command does not split transcripts, instead assigning a default label of “speech” to all identified speech segments. If you would like to preserve transcripts for each segment, see Segment transcribed files (mfa segment).

Note

On Windows, if you get an OSError/WinError 1314 during the run, follow these instructions to enable symbolic link creation permissions.

Command reference#

mfa segment_vad#

Create segments based on SpeechBrain’s voice activity detection (VAD) model or a basic energy-based algorithm

mfa segment_vad [OPTIONS] CORPUS_DIRECTORY OUTPUT_DIRECTORY

Options

-c, --config_path <config_path>#

Path to config file to use for training.

--output_format <output_format>#

Format for aligned output files (default is long_textgrid).

Options:

long_textgrid | short_textgrid | json | csv

--speechbrain, --no_speechbrain#

Flag for using SpeechBrain’s pretrained VAD model

--cuda, --no_cuda#

Flag for using CUDA for SpeechBrain’s model

--segment_transcripts, --no_segment_transcripts#

Flag for using CUDA for SpeechBrain’s model

-p, --profile <profile>#

Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#

Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#

Set the number of processes to use by default, defaults to 3

--clean, --no_clean#

Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#

Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#

Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#

Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#

Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#

Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#

Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#

Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#

Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#

Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#

Show this message and exit.

Arguments

CORPUS_DIRECTORY#

Required argument

OUTPUT_DIRECTORY#

Required argument

Configuration reference#

API reference#