segment_utterance_transcript#

montreal_forced_aligner.vad.multiprocessing.segment_utterance_transcript(acoustic_model, utterance, lexicon_compiler, vad_model, segmentation_options, cmvn=None, fmllr_trans=None, mfcc_options=None, vad_options=None, g2p_model=None, interjection_words=None, acoustic_scale=0.1, beam=16.0, lattice_beam=10.0, max_active=7000, min_active=200, prune_interval=25, beam_delta=0.5, hash_ratio=2.0, prune_scale=0.1, boost_silence=1.0)[source]#

Split an utterance and its transcript into multiple transcribed utterances

Parameters:
  • acoustic_model (AcousticModel) – Acoustic model to use in splitting transcriptions

  • utterance (Utterance) – Utterance to split

  • lexicon_compiler (LexiconCompiler) – Lexicon compiler

  • vad_model (VAD or None) – VAD model from SpeechBrain, if None, then Kaldi’s energy-based VAD is used

  • segmentation_options (dict[str, Any]) – Segmentation options

  • cmvn (DoubleMatrix) – CMVN stats to apply

  • fmllr_trans (FloatMatrix) – fMLLR transformation matrix for speaker adaptation

  • mfcc_options (dict[str, Any], optional) – MFCC options for energy based VAD

  • vad_options (dict[str, Any], optional) – Options for energy based VAD

  • acoustic_scale (float, optional) – Defaults to 0.1

  • beam (float, optional) – Defaults to 16

  • lattice_beam (float, optional) – Defaults to 10

  • max_active (int, optional) – Defaults to 7000

  • min_active (int, optional) – Defaults to 250

  • prune_interval (int, optional) – Defaults to 25

  • beam_delta (float, optional) – Defaults to 0.5

  • hash_ratio (float, optional) – Defaults to 2.0

  • prune_scale (float, optional) – Defaults to 0.1

  • boost_silence (float, optional) – Defaults to 1.0

Returns:

Split utterances

Return type:

list[Utterance]