segment_utterance_transcript#
- montreal_forced_aligner.vad.multiprocessing.segment_utterance_transcript(acoustic_model, utterance, lexicon_compiler, vad_model, segmentation_options, cmvn=None, fmllr_trans=None, mfcc_options=None, vad_options=None, g2p_model=None, interjection_words=None, acoustic_scale=0.1, beam=16.0, lattice_beam=10.0, max_active=7000, min_active=200, prune_interval=25, beam_delta=0.5, hash_ratio=2.0, prune_scale=0.1, boost_silence=1.0)[source]#
Split an utterance and its transcript into multiple transcribed utterances
- Parameters:
acoustic_model (
AcousticModel
) – Acoustic model to use in splitting transcriptionsutterance (
Utterance
) – Utterance to splitlexicon_compiler (
LexiconCompiler
) – Lexicon compilervad_model (
VAD
or None) – VAD model from SpeechBrain, if None, then Kaldi’s energy-based VAD is usedsegmentation_options (dict[str, Any]) – Segmentation options
cmvn (
DoubleMatrix
) – CMVN stats to applyfmllr_trans (
FloatMatrix
) – fMLLR transformation matrix for speaker adaptationmfcc_options (dict[str, Any], optional) – MFCC options for energy based VAD
vad_options (dict[str, Any], optional) – Options for energy based VAD
acoustic_scale (float, optional) – Defaults to 0.1
beam (float, optional) – Defaults to 16
lattice_beam (float, optional) – Defaults to 10
max_active (int, optional) – Defaults to 7000
min_active (int, optional) – Defaults to 250
prune_interval (int, optional) – Defaults to 25
beam_delta (float, optional) – Defaults to 0.5
hash_ratio (float, optional) – Defaults to 2.0
prune_scale (float, optional) – Defaults to 0.1
boost_silence (float, optional) – Defaults to 1.0
- Returns:
Split utterances
- Return type:
list[
Utterance
]