PyniniTrainerMixin#

class montreal_forced_aligner.g2p.trainer.PyniniTrainerMixin(order=8, random_starts=25, delta=0.0009765625, alpha=1.0, batch_size=800, num_iterations=10, smoothing_method='kneser_ney', pruning_method='relative_entropy', model_size=1000000, prune_threshold=1e-07, insertions=True, deletions=True, fst_default_cache_gc='', fst_default_cache_gc_limit='', **kwargs)[source]#

Bases: object

Mixin for training Pynini G2P models

Parameters:

order (int) – Order of the ngram model, defaults to 7
random_starts (int) – Number of random starts to use in initialization, defaults to 25
seed (int) – Seed for randomization, defaults to 1917
delta (float) – Comparison/quantization delta for Baum-Welch training, defaults to 1/1024
alpha (float) – Step size reduction power parameter for Baum-Welch training; full standard batch EM is run (not stepwise) if set to 0, defaults to 1.0
batch_size (int) – Batch size for Baum-Welch training, defaults to 200
num_iterations (int) – Maximum number of iterations to use in Baum-Welch training, defaults to 10
smoothing_method (str) – Smoothing method for the ngram model, defaults to “kneser_ney”
pruning_method (str) – Pruning method for pruning the ngram model, defaults to “relative_entropy”
model_size (int) – Target number of ngrams for pruning, defaults to 1000000
insertions (bool) – Flag for whether to allow for insertions, default True
deletions (bool) – Flag for whether to allow for deletions, default True
fst_default_cache_gc (str) – String to pass to OpenFst binaries for GC behavior
fst_default_cache_gc_limit (str) – String to pass to OpenFst binaries for GC behavior

property afst_path#: Path to store aligned FSTs

align_g2p()[source]#: Runs the entire alignment regimen.

property align_path#: Path to store alignment models

property architecture#: Pynini

property cg_path#: Path to covering grammar FST

property data_source_identifier#: Dictionary name

property encoder_path#: Internal temporary encoder file

property far_path#: Internal temporary FAR file

property fst_path#: Internal temporary FST file

generate_model()[source]#: Generate an ngram G2P model from FAR strings

property input_far_path#: Path to store grapheme archive

property input_path#: Path to temporary file to store grapheme training data

property output_far_path#: Path to store phone archive

property output_path#: Path to temporary file to store phone training data

property sym_path#: Internal temporary symbol file

train_iteration()[source]#: Train iteration, not used