SpeakerDiarizer#

class montreal_forced_aligner.diarization.speaker_diarizer.SpeakerDiarizer(ivector_extractor_path='speechbrain', expected_num_speakers=0, cluster=True, evaluation_mode=False, cuda=False, use_pca=True, metric='cosine', cluster_type='hdbscan', manifold_algorithm='tsne', distance_threshold=None, score_threshold=None, min_cluster_size=60, max_iterations=10, linkage='average', **kwargs)[source]#

Bases: IvectorCorpusMixin, TopLevelMfaWorker, FileExporterMixin

Class for performing speaker classification, not currently very functional, but is planned to be expanded in the future

Parameters:

ivector_extractor_path (str) – Path to ivector extractor model, or “speechbrain”
expected_num_speakers (int, optional) – Number of speakers in the corpus, if known
cluster (bool) – Flag for whether speakers should be clustered instead of classified
evaluation_mode (bool) – Flag for evaluating against existing speaker labels
cuda (bool) – Flag for using CUDA for speechbrain models
metric (str or DistanceMetric) – One of “cosine”, “plda”, or “euclidean”
cluster_type (str or ClusterType) – Clustering algorithm
relative_distance_threshold (float) – Threshold to use clustering based on distance

calculate_eer()[source]#

Calculate Equal Error Rate (EER) and threshold for the diarization metric using the ground truth data.

Returns:

float – EER
float – Threshold of EER

classify_speakers()[source]#: Classify speakers based on ivector or speechbrain model

cluster_utterances()[source]#: Cluster utterances with a ivector or speechbrain model

cluster_utterances_mfa()[source]#: Cluster utterances with a ivector or speechbrain model

compute_speaker_embeddings()[source]#: Generate per-speaker embeddings as the mean over their utterances

evaluate_classification()[source]#: Evaluate and output classification accuracy

evaluate_clustering()[source]#: Compute clustering metric scores and output clustering evaluation results

export_files(output_directory)[source]#

Export files with their new speaker labels

Parameters:: output_directory (str) – Output directory to save files

load_embeddings()[source]#: Load embeddings from a speechbrain model

property num_utts_path#: Path to archive containing number of per training speaker

classmethod parse_parameters(config_path=None, args=None, unknown_args=None)[source]#

Parse parameters for speaker classification from a config path or command-line arguments

Parameters:

config_path (Path) – Config path
args (dict[str, Any]) – Parsed arguments
unknown_args (list[str]) – Optional list of arguments that were not parsed

Returns:

Configuration parameters

Return type:

dict[str, Any]

plda_classification_arguments()[source]#

Generate Job arguments for PldaClassificationFunction

Returns:: Arguments for processing
Return type:: list[PldaClassificationArguments]

refresh_speaker_vectors()[source]#: Refresh speaker vectors following clustering or classification

setup()[source]#

Sets up the corpus and speaker classifier

Raises:: KaldiProcessingError – If there were any errors in running Kaldi binaries

property speaker_ivector_path#: Path to archive containing training speaker ivectors