The Montreal Forced Aligner can use trained ivector models (see Training an ivector extractor for more information about training these models) to classify or cluster utterances according to speakers.
Steps to classify speakers:
- Provided the steps in Installation have been completed and you are in the same Conda/virtual environment that MFA was installed in.
- Run the following command, substituting the arguments with your own paths:
mfa classify_speakers corpus_directory ivector_extractor_path output_directory
If the input uses TextGrids, the output TextGrids will have utterances sorted into tiers by each identified speaker. At the moment, there is no way to retrain the classifier based on new data.
If the input corpus directory does not have TextGrids associated with them, then the speaker classifier will output speaker directories with a text file that contains all the utterances that were classified.
Temporary directory root to use for aligning, default is
Number of jobs to use; defaults to 3, set higher if you have more processors available and would like to process faster
Number of speakers to return. If
--clusteris present, this specifies the number of clusters. Otherwise, MFA will sort speakers according to the first pass classification and then takes the top X speakers, and reclassify the utterances to only use those speakers.
MFA will perform clustering of utterance ivectors into the number of speakers specified by