Utterance#

class montreal_forced_aligner.db.Utterance(**kwargs)[source]#

Bases: Base

Database class for storing information about utterances

Parameters:
  • id (int) – Primary key

  • begin (float) – Beginning timestamp of the utterance

  • end (float) – Ending timestamp of the utterance, -1 if there is no audio file

  • duration (float) – Duration of the utterance

  • channel (int) – Channel of the utterance in the audio file

  • num_frames (int) – Number of feature frames extracted

  • text (str) – Input text for the utterance

  • oovs (str) – Space-delimited list of items that were not found in the speaker’s pronunciation dictionary

  • normalized_text (str) – Normalized text for the utterance, after removing case and punctuation, and splitting up compounds and clitics if the whole word is not found in the speaker’s pronunciation dictionary

  • features (str) – File index for generated features

  • in_subset (bool) – Flag for whether to use this utterance in the current training subset

  • ignored (bool) – Flag for if the utterance is ignored due to lacking features

  • alignment_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking both speech and silence phones into consideration

  • speech_log_likelihood (float) – Log likelihood for the alignment of the utterance, taking only the speech phones into consideration

  • duration_deviation (float) – Average of absolute z-score of speech phone duration

  • phone_error_rate (float) – Phone error rate for alignment evaluation

  • alignment_score (float) – Alignment score from alignment evaluation

  • word_error_rate (float) – Word error rate for transcription evaluation

  • character_error_rate (float) – Character error rate for transcription evaluation

  • file_id (int) – Foreign key to File

  • speaker_id (int) – Foreign key to Speaker

  • file (File) – File object that the utterance is from

  • speaker (Speaker) – Speaker object of the utterance

  • phone_intervals (list[PhoneInterval]) – Reference phone intervals

  • word_intervals (list[WordInterval]) – Aligned word intervals

  • job_id (int) – Foreign key to Job

  • job (Job) – Job that processes the utterance

property aligned_phone_intervals#

Phone intervals from montreal_forced_aligner.data.WorkflowType.alignment

property aligned_word_intervals#

Word intervals from montreal_forced_aligner.data.WorkflowType.alignment

property file_name#

Name of the utterance’s file

classmethod from_data(data, file, speaker, frame_shift=None)[source]#

Generate an utterance object from UtteranceData

Parameters:
  • data (UtteranceData) – Data for the utterance

  • file (File) – File database object for the utterance

  • speaker (Speaker) – Speaker database object for the utterance

  • frame_shift (int, optional) – Frame shift in ms to use for calculating the number of frames in the utterance

Returns:

Utterance object

Return type:

Utterance

property per_speaker_transcribed_phone_intervals#

Phone intervals from montreal_forced_aligner.data.WorkflowType.per_speaker_transcription

property per_speaker_transcribed_word_intervals#

Word intervals from montreal_forced_aligner.data.WorkflowType.per_speaker_transcription

phone_intervals_for_workflow(workflow_id)[source]#

Extract phone intervals for a given CorpusWorkflow

Parameters:

workflow_id (int) – Integer ID for CorpusWorkflow

Returns:

List of phone intervals

Return type:

list[CtmInterval]

property phone_transcribed_phone_intervals#

Phone intervals from montreal_forced_aligner.data.WorkflowType.phone_transcription

property reference_phone_intervals#

Phone intervals from montreal_forced_aligner.data.WorkflowType.reference

property speaker_name#

Name of the utterance’s speaker

to_data()[source]#

Construct an UtteranceData object that can be used in multiprocessing

Returns:

Data for the utterance

Return type:

UtteranceData

to_kalpy()[source]#

Construct an UtteranceData object that can be used in multiprocessing

Returns:

Data for the utterance

Return type:

UtteranceData

property transcribed_phone_intervals#

Phone intervals from montreal_forced_aligner.data.WorkflowType.transcription

property transcribed_word_intervals#

Word intervals from montreal_forced_aligner.data.WorkflowType.transcription

word_intervals_for_workflow(workflow_id)[source]#

Extract word intervals for a given CorpusWorkflow

Parameters:

workflow_id (int) – Integer ID for CorpusWorkflow

Returns:

List of word intervals

Return type:

list[CtmInterval]