.. _configuration_g2p: ***************** G2P Configuration ***************** Global options ============== .. csv-table:: :widths: 20, 20, 60 :header: "Parameter", "Default value", "Notes" :escape: ' "punctuation", "、。।,@<>'"'(),.:;¿?¡!\\&%#*~【】,…‥「」『』〝〟″⟨⟩♪・‹›«»~′$+=", "Characters to treat as punctuation and strip from around words" "clitic_markers", "'''’", "Characters to treat as clitic markers, will be collapsed to the first character in the string" "compound_markers", "\-", "Characters to treat as marker in compound words (i.e., doesn't need to be preserved like for clitics)" "num_pronunciations", 1, "Number of pronunciations to generate" .. _train_g2p_config: G2P training options ==================== In addition to the parameters above, the following parameters are used as part of training a G2P model. .. csv-table:: :widths: 20, 20, 60 :header: "Parameter", "Default value", "Notes" "order", 7, "Ngram order of the G2P Model" "random_starts", 25, "Number of random starts for aligning orthography to phones" "seed", 1917, "Seed for randomization" "delta", 1/1024, "Comparison/quatization delta for Baum-Welch training" "lr", 1.0, "Learning rate for Baum-Welch training" "batch_size", 200, "Batch size for Baum-Welch training" "max_iterations", 10, "Maximum number of iterations to use in Baum-Welch training" "smoothing_method", "kneser_ney", "Smoothing method for the ngram model" "pruning_method", "relative_entropy", "Pruning method for pruning the ngram model" "model_size", 1000000, "Target number of ngrams for pruning" Example G2P configuration files =============================== .. _default_train_g2p_config: Default G2P training config file -------------------------------- .. code-block:: yaml punctuation: "、。।,@<>\"(),.:;¿?¡!\\&%#*~【】,…‥「」『』〝〟″⟨⟩♪・‹›«»~′$+=" clitic_markers: "'’" compound_markers: "-" num_pronunciations: 1 # Used if running in validation mode order: 7 random_starts: 25 seed: 1917 delta: 0.0009765 lr: 1.0 batch_size: 200 max_iterations: 10 smoothing_method: "kneser_ney" pruning_method: "relative_entropy" model_size: 1000000 .. _default_g2p_config: Default dictionary generation config file ----------------------------------------- .. code-block:: yaml punctuation: "、。।,@<>\"(),.:;¿?¡!\\&%#*~【】,…‥「」『』〝〟″⟨⟩♪・‹›«»~′$+=" clitic_markers: "'’" compound_markers: "-" num_pronunciations: 1