Dictionary

class montreal_forced_aligner.dictionary.Dictionary(input_path, output_directory, oov_code='<unk>', position_dependent_phones=True, num_sil_states=5, num_nonsil_states=3, shared_silence_phones=True, sil_prob=0.5, word_set=None, debug=False, logger=None, punctuation=None, clitic_markers=None, compound_markers=None, multilingual_ipa=False, strip_diacritics=None, digraphs=None)[source]

Class containing information about a pronunciation dictionary

Parameters:
input_path : str

Path to an input pronunciation dictionary

output_directory : str

Path to a directory to store files for Kaldi

oov_code : str, optional

What to label words not in the dictionary, defaults to '<unk>'

position_dependent_phones : bool, optional

Specifies whether phones should be represented as dependent on their position in the word (beginning, middle or end), defaults to True

num_sil_states : int, optional

Number of states to use for silence phones, defaults to 5

num_nonsil_states : int, optional

Number of states to use for non-silence phones, defaults to 3

shared_silence_phones : bool, optional

Specify whether to share states across all silence phones, defaults to True

pronunciation probabilities : bool, optional

Specifies whether to model different pronunciation probabilities or to treat each entry as a separate word, defaults to True

sil_prob : float, optional

Probability of optional silences following words, defaults to 0.5

Attributes

actual_words
disambig_path
has_multiple
oov_int The integer id for out of vocabulary items
optional_silence_csl Phone id of the optional silence phone
phones The set of all phones (silence and non-silence)
phones_dir Directory to store information Kaldi needs about phones
positional_nonsil_phones List of non-silence phones with positions
positional_sil_phones List of silence phones with positions
positions
reversed_phone_mapping A mapping of integer ids to phones
reversed_word_mapping A mapping of integer ids to words
silence_csl A colon-separated list (as a string) of silence phone ids
topo_sil_template
topo_template
topo_transition_template
words_symbol_path

Methods

add_disambiguation()
check_word(item)
cleanup() Clean up temporary files in the output directory
create_utterance_fst(text, frequent_words)
exlude_for_alignment(w)
export_lexicon(path[, disambig, probability])
generate_mappings()
log_info()
save_oovs_found(directory) Save all out of vocabulary items to a file in the specified directory
set_word_set(word_set)
split_clitics(item)
to_int(item) Convert a given word into its integer id
write([disambig]) Write the files necessary for Kaldi
cleanup()[source]

Clean up temporary files in the output directory

oov_int

The integer id for out of vocabulary items

optional_silence_csl

Phone id of the optional silence phone

phones

The set of all phones (silence and non-silence)

phones_dir

Directory to store information Kaldi needs about phones

positional_nonsil_phones

List of non-silence phones with positions

positional_sil_phones

List of silence phones with positions

reversed_phone_mapping

A mapping of integer ids to phones

reversed_word_mapping

A mapping of integer ids to words

save_oovs_found(directory)[source]

Save all out of vocabulary items to a file in the specified directory

Parameters:
directory : str

Path to directory to save oovs_found.txt

silence_csl

A colon-separated list (as a string) of silence phone ids

to_int(item)[source]

Convert a given word into its integer id

write(disambig=False)[source]

Write the files necessary for Kaldi