Speaker diarization `(mfa diarize_speakers)`#

The Montreal Forced Aligner can use trained ivector models (see Train an ivector extractor (mfa train_ivector) for more information about training these models) to classify or cluster utterances according to speakers.

Following ivector extraction, MFA stores utterance and speaker ivectors in PLDA-transformed space. Storing the PLDA transformation ensures that the transformation is performed only once when ivectors are initially extracted, rather than done each time scoring occurs. The dimensionality of the PLDA-transformed ivectors is 50, by default, but this can be changed through the Global configuration command.

Command reference#

mfa diarize_speakers#

Use an ivector extractor to cluster utterances into speakers

If you would like to use SpeechBrain’s speaker recognition model, specify speechbrain as the ivector_extractor_path. When using SpeechBrain’s speaker recognition model, the --cuda flag is available to perform computations on GPU, and the --num_jobs parameter will be used as a the batch size for any parallel computation.

mfa diarize_speakers [OPTIONS] CORPUS_DIRECTORY IVECTOR_EXTRACTOR_PATH
                     OUTPUT_DIRECTORY

Options

-c, --config_path <config_path>#: Path to config file to use for training.

-s, --expected_num_speakers <expected_num_speakers>#: Number of speakers if known.

--output_format <output_format>#

Format for aligned output files (default is long_textgrid).

Options:: long_textgrid | short_textgrid | json | csv

--classify, --cluster#: Specify whether to classify speakers into pretrained IDs or cluster speakers without a classification model, default is cluster

--cluster_type <cluster_type>#

Type of clustering algorithm to use

Options:: mfa | affinity | agglomerative | spectral | dbscan | hdbscan | optics | kmeans | meanshift

--cuda, --no_cuda#: Flag for using CUDA for SpeechBrain’s model

--use_pca, --no_use_pca#: Flag for using PCA representations of ivectors

--evaluate, --validate#: Flag for whether to evaluate clustering/classification against existing speakers.

-p, --profile <profile>#: Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#: Set the default temporary directory, default is /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#: Set the number of processes to use by default, defaults to 3

--clean, --no_clean#: Remove files from previous runs, default is False

-v, --verbose, -nv, --no_verbose#: Output debug messages, default is False

-q, --quiet, -nq, --no_quiet#: Suppress all output messages (overrides verbose), default is False

--overwrite, --no_overwrite#: Overwrite output files when they exist, default is False

--use_mp, --no_use_mp#: Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions.

--use_threading, --no_use_threading#: Use threading library rather than multiprocessing library. Multiprocessing is recommended will allow for faster executions.

-d, --debug, -nd, --no_debug#: Run extra steps for debugging issues, default is False

--use_postgres, --no_use_postgres#: Use postgres instead of sqlite for extra functionality, default is False

--single_speaker#: Single speaker mode creates multiprocessing splits based on utterances rather than speakers. This mode also disables speaker adaptation equivalent to --uses_speaker_adaptation false.

--textgrid_cleanup, --cleanup_textgrids, --no_textgrid_cleanup, --no_cleanup_textgrids#: Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics.

-h, --help#: Show this message and exit.

Arguments

CORPUS_DIRECTORY#: Required argument

IVECTOR_EXTRACTOR_PATH#: Required argument

OUTPUT_DIRECTORY#: Required argument

Configuration reference#

Diarization options

API reference#

Speaker diarization

Speaker diarization (mfa diarize_speakers)#

Command reference#

mfa diarize_speakers#

Configuration reference#

API reference#

Speaker diarization `(mfa diarize_speakers)`#