Dictionary¶
- class aligner.dictionary.Dictionary(input_path, output_directory, oov_code='<unk>', position_dependent_phones=True, num_sil_states=5, num_nonsil_states=3, shared_silence_phones=True, sil_prob=0.5, word_set=None, debug=False)[source]¶
Class containing information about a pronunciation dictionary
- Parameters:
- input_pathstr
Path to an input pronunciation dictionary
- output_directorystr
Path to a directory to store files for Kaldi
- oov_codestr, optional
What to label words not in the dictionary, defaults to
'<unk>'
- position_dependent_phonesbool, optional
Specifies whether phones should be represented as dependent on their position in the word (beginning, middle or end), defaults to True
- num_sil_statesint, optional
Number of states to use for silence phones, defaults to 5
- num_nonsil_statesint, optional
Number of states to use for non-silence phones, defaults to 3
- shared_silence_phonesbool, optional
Specify whether to share states across all silence phones, defaults to True
- pronunciation probabilitiesbool, optional
Specifies whether to model different pronunciation probabilities or to treat each entry as a separate word, defaults to True
- sil_probfloat, optional
Probability of optional silences following words, defaults to 0.5
Attributes
clitic_markers
oov_int
The integer id for out of vocabulary items
optional_silence_csl
Phone id of the optional silence phone
phones
The set of all phones (silence and non-silence)
phones_dir
Directory to store information Kaldi needs about phones
positional_nonsil_phones
List of non-silence phones with positions
positional_sil_phones
List of silence phones with positions
positions
reversed_phone_mapping
A mapping of integer ids to phones
reversed_word_mapping
A mapping of integer ids to words
silence_csl
A colon-separated list (as a string) of silence phone ids
topo_sil_template
topo_template
topo_transition_template
Methods
add_disambiguation
()cleanup
()Clean up temporary files in the output directory
create_utterance_fst
(text, frequent_words)export_lexicon
(path[, disambig, probability])generate_mappings
()save_oovs_found
(directory)Save all out of vocabulary items to a file in the specified directory
separate_clitics
(item)Separates words with apostrophes or hyphens if the subparts are in the lexicon.
to_int
(item)Convert a given word into its integer id
write
()Write the files necessary for Kaldi