Dictionary¶

class aligner.dictionary.Dictionary(input_path, output_directory, oov_code='<unk>', position_dependent_phones=True, num_sil_states=5, num_nonsil_states=3, shared_silence_phones=True, sil_prob=0.5, word_set=None, debug=False)[source]¶

Class containing information about a pronunciation dictionary

Parameters:

input_pathstr: Path to an input pronunciation dictionary
output_directorystr: Path to a directory to store files for Kaldi
oov_codestr, optional: What to label words not in the dictionary, defaults to '<unk>'
position_dependent_phonesbool, optional: Specifies whether phones should be represented as dependent on their position in the word (beginning, middle or end), defaults to True
num_sil_statesint, optional: Number of states to use for silence phones, defaults to 5
num_nonsil_statesint, optional: Number of states to use for non-silence phones, defaults to 3
shared_silence_phonesbool, optional: Specify whether to share states across all silence phones, defaults to True
pronunciation probabilitiesbool, optional: Specifies whether to model different pronunciation probabilities or to treat each entry as a separate word, defaults to True
sil_probfloat, optional: Probability of optional silences following words, defaults to 0.5

Attributes

`clitic_markers`
`oov_int`	The integer id for out of vocabulary items
`optional_silence_csl`	Phone id of the optional silence phone
`phones`	The set of all phones (silence and non-silence)
`phones_dir`	Directory to store information Kaldi needs about phones
`positional_nonsil_phones`	List of non-silence phones with positions
`positional_sil_phones`	List of silence phones with positions
`positions`
`reversed_phone_mapping`	A mapping of integer ids to phones
`reversed_word_mapping`	A mapping of integer ids to words
`silence_csl`	A colon-separated list (as a string) of silence phone ids
`topo_sil_template`
`topo_template`
`topo_transition_template`

Methods

`add_disambiguation`()
`cleanup`()	Clean up temporary files in the output directory
`create_utterance_fst`(text, frequent_words)
`export_lexicon`(path[, disambig, probability])
`generate_mappings`()
`save_oovs_found`(directory)	Save all out of vocabulary items to a file in the specified directory
`separate_clitics`(item)	Separates words with apostrophes or hyphens if the subparts are in the lexicon.
`to_int`(item)	Convert a given word into its integer id
`write`()	Write the files necessary for Kaldi