PyniniTrainerMixin#

class montreal_forced_aligner.g2p.trainer.PyniniTrainerMixin(order=8, random_starts=25, delta=0.0009765625, alpha=1.0, batch_size=800, num_iterations=10, smoothing_method='kneser_ney', pruning_method='relative_entropy', model_size=1000000, prune_threshold=1e-07, insertions=True, deletions=True, fst_default_cache_gc='', fst_default_cache_gc_limit='', **kwargs)[source]#

Bases: object

Mixin for training Pynini G2P models

Parameters:
  • order (int) – Order of the ngram model, defaults to 7

  • random_starts (int) – Number of random starts to use in initialization, defaults to 25

  • seed (int) – Seed for randomization, defaults to 1917

  • delta (float) – Comparison/quantization delta for Baum-Welch training, defaults to 1/1024

  • alpha (float) – Step size reduction power parameter for Baum-Welch training; full standard batch EM is run (not stepwise) if set to 0, defaults to 1.0

  • batch_size (int) – Batch size for Baum-Welch training, defaults to 200

  • num_iterations (int) – Maximum number of iterations to use in Baum-Welch training, defaults to 10

  • smoothing_method (str) – Smoothing method for the ngram model, defaults to “kneser_ney”

  • pruning_method (str) – Pruning method for pruning the ngram model, defaults to “relative_entropy”

  • model_size (int) – Target number of ngrams for pruning, defaults to 1000000

  • insertions (bool) – Flag for whether to allow for insertions, default True

  • deletions (bool) – Flag for whether to allow for deletions, default True

  • fst_default_cache_gc (str) – String to pass to OpenFst binaries for GC behavior

  • fst_default_cache_gc_limit (str) – String to pass to OpenFst binaries for GC behavior

property afst_path#

Path to store aligned FSTs

align_g2p()[source]#

Runs the entire alignment regimen.

property align_path#

Path to store alignment models

property architecture#

Pynini

property cg_path#

Path to covering grammar FST

property data_source_identifier#

Dictionary name

property encoder_path#

Internal temporary encoder file

property far_path#

Internal temporary FAR file

property fst_path#

Internal temporary FST file

generate_model()[source]#

Generate an ngram G2P model from FAR strings

property input_far_path#

Path to store grapheme archive

property input_path#

Path to temporary file to store grapheme training data

property output_far_path#

Path to store phone archive

property output_path#

Path to temporary file to store phone training data

property sym_path#

Internal temporary symbol file

train_iteration()[source]#

Train iteration, not used