Utterance#

class montreal_forced_aligner.db.Utterance(**kwargs)[source]#

Bases: Base

Database class for storing information about utterances

Parameters:

id (int) – Primary key
begin (float) – Beginning timestamp of the utterance
end (float) – Ending timestamp of the utterance, -1 if there is no audio file
duration (float) – Duration of the utterance
channel (int) – Channel of the utterance in the audio file
num_frames (int) – Number of feature frames extracted
text (str) – Input text for the utterance
oovs (str) – Space-delimited list of items that were not found in the speaker’s pronunciation dictionary
normalized_text (str) – Normalized text for the utterance, after removing case and punctuation, and splitting up compounds and clitics if the whole word is not found in the speaker’s pronunciation dictionary
features (str) – File index for generated features
in_subset (bool) – Flag for whether to use this utterance in the current training subset
ignored (bool) – Flag for if the utterance is ignored due to lacking features
alignment_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking both speech and silence phones into consideration
speech_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking only the speech phones into consideration
duration_deviation (float) – Average of absolute z-score of speech phone duration
phone_error_rate (float) – Phone error rate for alignment evaluation
alignment_score (float) – Alignment score from alignment evaluation
word_error_rate (float) – Word error rate for transcription evaluation
character_error_rate (float) – Character error rate for transcription evaluation
file_id (int) – Foreign key to File
speaker_id (int) – Foreign key to Speaker
file (File) – File object that the utterance is from
speaker (Speaker) – Speaker object of the utterance
phone_intervals (list[PhoneInterval]) – Reference phone intervals
word_intervals (list[WordInterval]) – Aligned word intervals
job_id (int) – Foreign key to Job
job (Job) – Job that processes the utterance

property file_name#: Name of the utterance’s file

classmethod from_data(data, file, speaker, frame_shift=None)[source]#

Generate an utterance object from UtteranceData

Parameters:

data (UtteranceData) – Data for the utterance
file (File) – File database object for the utterance
speaker (Speaker) – Speaker database object for the utterance
frame_shift (int, optional) – Frame shift in ms to use for calculating the number of frames in the utterance

Returns:

Utterance object

Return type:

Utterance

property segment#

Construct an UtteranceData object that can be used in multiprocessing

Returns:: Segment for the utterance
Return type:: Segment

property speaker_name#: Name of the utterance’s speaker

to_data()[source]#

Construct an UtteranceData object that can be used in multiprocessing

Returns:: Data for the utterance
Return type:: UtteranceData

to_kalpy()[source]#

Construct an UtteranceData object that can be used in multiprocessing

Returns:: Kalpy utterance
Return type:: Utterance