VadSegmenter#

class montreal_forced_aligner.vad.segmenter.VadSegmenter(**kwargs)[source]#

Bases: VadConfigMixin, AcousticCorpusMixin, FileExporterMixin, SpeechbrainSegmenterMixin, TopLevelMfaWorker

Class for performing speaker classification, parameters are passed to speechbrain.pretrained.interfaces.VAD.get_speech_segments

Parameters:
  • segment_padding (float) – Size of padding on both ends of a segment

  • large_chunk_size (float) – Size (in seconds) of the large chunks that are read sequentially from the input audio file.

  • small_chunk_size (float) – Size (in seconds) of the small chunks extracted from the large ones. The audio signal is processed in parallel within the small chunks. Note that large_chunk_size/small_chunk_size must be an integer.

  • overlap_small_chunk (bool) – If True, it creates overlapped small chunks (with 50% overal). The probabilities of the overlapped chunks are combined using hamming windows.

  • apply_energy_VAD (bool) – If True, a energy-based VAD is used on the detected speech segments. The neural network VAD often creates longer segments and tends to merge close segments together. The energy VAD post-processes can be useful for having a fine-grained voice activity detection. The energy thresholds is managed by activation_th and deactivation_th (see below).

  • double_check (bool) – If True, double checks (using the neural VAD) that the candidate speech segments actually contain speech. A threshold on the mean posterior probabilities provided by the neural network is applied based on the speech_th parameter (see below).

  • activation_th (float) – Threshold of the neural posteriors above which starting a speech segment.

  • deactivation_th (float) – Threshold of the neural posteriors below which ending a speech segment.

  • en_activation_th (float) – A new speech segment is started it the energy is above activation_th. This is active only if apply_energy_VAD is True.

  • en_deactivation_th (float) – The segment is considered ended when the energy is <= deactivation_th. This is active only if apply_energy_VAD is True.

  • speech_th (float) – Threshold on the mean posterior probability within the candidate speech segment. Below that threshold, the segment is re-assigned to a non-speech region. This is active only if double_check is True.

  • close_th (float) – If the distance between boundaries is smaller than close_th, the segments will be merged.

  • len_th (float) – If the length of the segment is smaller than len_th, the segments will be merged.

export_files(output_directory, output_format=None)[source]#

Export the results of segmentation as TextGrids

Parameters:
  • output_directory (str) – Directory to save segmentation TextGrids

  • output_format (str, optional) – Format to force output files into

classmethod parse_parameters(config_path=None, args=None, unknown_args=None)[source]#

Parse parameters for segmentation from a config path or command-line arguments

Parameters:
  • config_path (Path) – Config path

  • args (dict[str, Any]) – Parsed arguments

  • unknown_args (list[str]) – Optional list of arguments that were not parsed

Returns:

Configuration parameters

Return type:

dict[str, Any]

segment()[source]#

Performs VAD and segmentation into utterances

Raises:

KaldiProcessingError – If there were any errors in running Kaldi binaries

segment_vad_arguments()[source]#

Generate Job arguments for SegmentVadFunction

Returns:

Arguments for processing

Return type:

list[SegmentVadArguments]

segment_vad_mfa()[source]#

Run segmentation based off of VAD.

See also

SegmentVadFunction

Multiprocessing helper function for each job

segment_vad_arguments

Job method for generating arguments for helper function

segment_vad_speechbrain()[source]#

Run segmentation based off of VAD.

See also

SegmentVadFunction

Multiprocessing helper function for each job

segment_vad_arguments

Job method for generating arguments for helper function

setup()[source]#

Setup segmentation