Utterance#
- class montreal_forced_aligner.db.Utterance(**kwargs)[source]#
Bases:
BaseDatabase class for storing information about utterances
- Parameters:
id (int) – Primary key
begin (float) – Beginning timestamp of the utterance
end (float) – Ending timestamp of the utterance, -1 if there is no audio file
duration (float) – Duration of the utterance
channel (int) – Channel of the utterance in the audio file
num_frames (int) – Number of feature frames extracted
text (str) – Input text for the utterance
oovs (str) – Space-delimited list of items that were not found in the speaker’s pronunciation dictionary
normalized_text (str) – Normalized text for the utterance, after removing case and punctuation, and splitting up compounds and clitics if the whole word is not found in the speaker’s pronunciation dictionary
features (str) – File index for generated features
in_subset (bool) – Flag for whether to use this utterance in the current training subset
ignored (bool) – Flag for if the utterance is ignored due to lacking features
alignment_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking both speech and silence phones into consideration
speech_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking only the speech phones into consideration
duration_deviation (float) – Average of absolute z-score of speech phone duration
phone_error_rate (float) – Phone error rate for alignment evaluation
alignment_score (float) – Alignment score from alignment evaluation
word_error_rate (float) – Word error rate for transcription evaluation
character_error_rate (float) – Character error rate for transcription evaluation
file (
File) – File object that the utterance is fromspeaker (
Speaker) – Speaker object of the utterancephone_intervals (list[
PhoneInterval]) – Reference phone intervalsword_intervals (list[
WordInterval]) – Aligned word intervalsjob (
Job) – Job that processes the utterance
- property file_name#
Name of the utterance’s file
- classmethod from_data(data, file, speaker, frame_shift=None)[source]#
Generate an utterance object from
UtteranceData- Parameters:
data (
UtteranceData) – Data for the utterancefile (
File) – File database object for the utterancespeaker (
Speaker) – Speaker database object for the utteranceframe_shift (int, optional) – Frame shift in ms to use for calculating the number of frames in the utterance
- Returns:
Utterance object
- Return type:
- property segment#
Construct an UtteranceData object that can be used in multiprocessing
- Returns:
Segment for the utterance
- Return type:
Segment
- property speaker_name#
Name of the utterance’s speaker