SpeakerDiarizer#
- class montreal_forced_aligner.diarization.speaker_diarizer.SpeakerDiarizer(ivector_extractor_path='speechbrain', expected_num_speakers=0, cluster=True, evaluation_mode=False, cuda=False, use_pca=True, metric='cosine', cluster_type='hdbscan', manifold_algorithm='tsne', distance_threshold=None, score_threshold=None, min_cluster_size=60, max_iterations=10, linkage='average', **kwargs)[source]#
Bases:
IvectorCorpusMixin
,TopLevelMfaWorker
,FileExporterMixin
Class for performing speaker classification, not currently very functional, but is planned to be expanded in the future
- Parameters:
ivector_extractor_path (str) – Path to ivector extractor model, or “speechbrain”
expected_num_speakers (int, optional) – Number of speakers in the corpus, if known
cluster (bool) – Flag for whether speakers should be clustered instead of classified
evaluation_mode (bool) – Flag for evaluating against existing speaker labels
cuda (bool) – Flag for using CUDA for speechbrain models
metric (str or
DistanceMetric
) – One of “cosine”, “plda”, or “euclidean”cluster_type (str or
ClusterType
) – Clustering algorithmrelative_distance_threshold (float) – Threshold to use clustering based on distance
- calculate_eer()[source]#
Calculate Equal Error Rate (EER) and threshold for the diarization metric using the ground truth data.
- Returns:
float – EER
float – Threshold of EER
- compute_speaker_embeddings()[source]#
Generate per-speaker embeddings as the mean over their utterances
- evaluate_clustering()[source]#
Compute clustering metric scores and output clustering evaluation results
- export_files(output_directory)[source]#
Export files with their new speaker labels
- Parameters:
output_directory (str) – Output directory to save files
- property num_utts_path#
Path to archive containing number of per training speaker
- classmethod parse_parameters(config_path=None, args=None, unknown_args=None)[source]#
Parse parameters for speaker classification from a config path or command-line arguments
- plda_classification_arguments()[source]#
Generate Job arguments for
PldaClassificationFunction
- Returns:
Arguments for processing
- Return type:
- setup()[source]#
Sets up the corpus and speaker classifier
- Raises:
KaldiProcessingError – If there were any errors in running Kaldi binaries
- property speaker_ivector_path#
Path to archive containing training speaker ivectors