Dictionary

class aligner.dictionary.Dictionary(input_path, output_directory, oov_code='<unk>', position_dependent_phones=True, num_sil_states=5, num_nonsil_states=3, shared_silence_phones=True, sil_prob=0.5, word_set=None, debug=False)[source]

Class containing information about a pronunciation dictionary

Parameters:
input_pathstr

Path to an input pronunciation dictionary

output_directorystr

Path to a directory to store files for Kaldi

oov_codestr, optional

What to label words not in the dictionary, defaults to '<unk>'

position_dependent_phonesbool, optional

Specifies whether phones should be represented as dependent on their position in the word (beginning, middle or end), defaults to True

num_sil_statesint, optional

Number of states to use for silence phones, defaults to 5

num_nonsil_statesint, optional

Number of states to use for non-silence phones, defaults to 3

shared_silence_phonesbool, optional

Specify whether to share states across all silence phones, defaults to True

pronunciation probabilitiesbool, optional

Specifies whether to model different pronunciation probabilities or to treat each entry as a separate word, defaults to True

sil_probfloat, optional

Probability of optional silences following words, defaults to 0.5

Attributes

clitic_markers

oov_int

The integer id for out of vocabulary items

optional_silence_csl

Phone id of the optional silence phone

phones

The set of all phones (silence and non-silence)

phones_dir

Directory to store information Kaldi needs about phones

positional_nonsil_phones

List of non-silence phones with positions

positional_sil_phones

List of silence phones with positions

positions

reversed_phone_mapping

A mapping of integer ids to phones

reversed_word_mapping

A mapping of integer ids to words

silence_csl

A colon-separated list (as a string) of silence phone ids

topo_sil_template

topo_template

topo_transition_template

Methods

add_disambiguation()

cleanup()

Clean up temporary files in the output directory

create_utterance_fst(text, frequent_words)

export_lexicon(path[, disambig, probability])

generate_mappings()

save_oovs_found(directory)

Save all out of vocabulary items to a file in the specified directory

separate_clitics(item)

Separates words with apostrophes or hyphens if the subparts are in the lexicon.

to_int(item)

Convert a given word into its integer id

write()

Write the files necessary for Kaldi