G2P Configuration

Global options

Parameter Default value Notes
punctuation 、。।,@<>”(),.:;¿?¡!\&%#*~【】,…‥「」『』〝〟″⟨⟩♪・‹›«»~′$+= Characters to treat as punctuation and strip from around words
clitic_markers ‘’ Characters to treat as clitic markers, will be collapsed to the first character in the string
compound_markers - Characters to treat as marker in compound words (i.e., doesnt need to be preserved like for clitics)
num_pronunciations 1 Number of pronunciations to generate
use_mp True Flag for whether to use multiprocessing

Train G2P Configuration

In addition to the parameters above, the following parameters are used as part of training a G2P model.

Parameter Default value Notes
order 7 Ngram order of the G2P Model
random_starts 25 Number of random starts for aligning orthography to phones
seed 1917 Seed for randomization
delta 1/1024 Comparison/quatization delta for Baum-Welch training
lr 1.0 Learning rate for Baum-Welch training
batch_size 200 Batch size for Baum-Welch training
max_iterations 10 Maximum number of iterations to use in Baum-Welch training
smoothing_method kneser_ney Smoothing method for the ngram model
pruning_method relative_entropy Pruning method for pruning the ngram model
model_size 1000000 Target number of ngrams for pruning

