SpeakerDiarizer#

class montreal_forced_aligner.diarization.speaker_diarizer.SpeakerDiarizer(ivector_extractor_path='speechbrain', expected_num_speakers=0, cluster=True, evaluation_mode=False, cuda=False, use_pca=True, metric='cosine', cluster_type='hdbscan', manifold_algorithm='tsne', distance_threshold=None, score_threshold=None, min_cluster_size=60, max_iterations=10, linkage='average', **kwargs)[source]#

Bases: IvectorCorpusMixin, TopLevelMfaWorker, FileExporterMixin

Class for performing speaker classification, not currently very functional, but is planned to be expanded in the future

Parameters:
  • ivector_extractor_path (str) – Path to ivector extractor model, or “speechbrain”

  • expected_num_speakers (int, optional) – Number of speakers in the corpus, if known

  • cluster (bool) – Flag for whether speakers should be clustered instead of classified

  • evaluation_mode (bool) – Flag for evaluating against existing speaker labels

  • cuda (bool) – Flag for using CUDA for speechbrain models

  • metric (str or DistanceMetric) – One of “cosine”, “plda”, or “euclidean”

  • cluster_type (str or ClusterType) – Clustering algorithm

  • relative_distance_threshold (float) – Threshold to use clustering based on distance

calculate_eer()[source]#

Calculate Equal Error Rate (EER) and threshold for the diarization metric using the ground truth data.

Returns:

  • float – EER

  • float – Threshold of EER

classify_speakers()[source]#

Classify speakers based on ivector or speechbrain model

cluster_utterances()[source]#

Cluster utterances with a ivector or speechbrain model

cluster_utterances_mfa()[source]#

Cluster utterances with a ivector or speechbrain model

compute_speaker_embeddings()[source]#

Generate per-speaker embeddings as the mean over their utterances

evaluate_classification()[source]#

Evaluate and output classification accuracy

evaluate_clustering()[source]#

Compute clustering metric scores and output clustering evaluation results

export_files(output_directory)[source]#

Export files with their new speaker labels

Parameters:

output_directory (str) – Output directory to save files

load_embeddings()[source]#

Load embeddings from a speechbrain model

property num_utts_path#

Path to archive containing number of per training speaker

classmethod parse_parameters(config_path=None, args=None, unknown_args=None)[source]#

Parse parameters for speaker classification from a config path or command-line arguments

Parameters:
  • config_path (Path) – Config path

  • args (dict[str, Any]) – Parsed arguments

  • unknown_args (list[str]) – Optional list of arguments that were not parsed

Returns:

Configuration parameters

Return type:

dict[str, Any]

plda_classification_arguments()[source]#

Generate Job arguments for PldaClassificationFunction

Returns:

Arguments for processing

Return type:

list[PldaClassificationArguments]

refresh_speaker_vectors()[source]#

Refresh speaker vectors following clustering or classification

setup()[source]#

Sets up the corpus and speaker classifier

Raises:

KaldiProcessingError – If there were any errors in running Kaldi binaries

property speaker_ivector_path#

Path to archive containing training speaker ivectors