Global Options#

These options are used for aligning the full dataset (and as part of training). Increasing the values of them will allow for more relaxed restrictions on alignment. Relaxing these restrictions can be particularly helpful for certain kinds of files that are quite different from the training dataset (i.e., single word production data from experiments, or longer stretches of audio).

Parameter

Default value

Notes

beam

10

Initial beam width to use for alignment

retry_beam

40

Beam width to use if initial alignment fails

transition_scale

1.0

Multiplier to scale transition costs

acoustic_scale

0.1

Multiplier to scale acoustic costs

self_loop_scale

0.1

Multiplier to scale self loop costs

boost_silence

1.0

1.0 is the value that does not affect probabilities

Feature Configuration#

This section is only relevant for training, as the trained model will contain extractors and feature specification for what they requires.

Parameter

Default value

Notes

feature_type

mfcc

Currently only MFCCs are supported

use_energy

False

Use energy in place of first MFCC

frame_shift

10

In milliseconds, determines time resolution

snip_edges

True

Should provide better time resolution in alignment

use_pitch

False

Flag for whether to compute pitch features

low_frequency

20

Frequency cut off for feature generation

high_frequency

7800

Frequency cut off for feature generation

sample_frequency

16000

Sample rate to up- or down-sample to

allow_downsample

True

Flag for allowing down-sampling

allow_upsample

True

Flag for allowing up-sampling

uses_cmvn

True

Flag for whether to use CMVN

uses_deltas

True

Flag for whether to use delta features

uses_splices

False

Flag for whether to use splices and LDA transformations

splice_left_context

3

Frame width for generating LDA transforms

splice_right_context

3

Frame width for generating LDA transforms

uses_speaker_adaptation

False

Flag for whether to use speaker adaptation

fmllr_update_type

full

Type of fMLLR estimation

silence_weight

0.0

Weight of silence in calculating LDA or fMLLR

Dictionary and text parsing options#

This sections details configuration options related to how MFA normalizes text and performs dictionary look up. Punctuation is stripped from all words, so if a character is part of a language’s orthography, modifying the punctuation parameter to exclude that character would keep that character in the words. See more examples of how these punctuation, clitic_markers, and compound_markers are used in Text normalization and dictionary lookup.

Parameter

Default value

Notes

oov_word

<unk>

Internal word symbol to use for out of vocabulary items

silence_word

<eps>

Internal word symbol to use optional silence

optional_silence_phone

sil

Internal phone symbol to use optional silence in or around utterances

oov_phone

spn

Internal phone symbol to use for out of vocabulary items

position_dependent_phones

True

Flag for whether phones should mark their position in the word as part of the phone symbol internally

num_silence_states

5

Number of states to use for silence phones

num_non_silence_states

3

Number of states to use for non-silence phones

shared_silence_phones

False

Flag for whether to share silence phone models

ignore_case

True

Flag for whether transcriptions should be converted to lower case

silence_probability

0.5

Probability of inserting silence around and within utterances, setting to 0 removes silence modelling

initial_silence_probability

0.5

Probability of starting with silence, setting to 0 removes initial silence

final_silence_correction

None

Correction factor of ending utterances with silence, only relevant for lexicons with trained silence probabilities

final_non_silence_correction

None

Correction factor of ending utterances without silence, only relevant for lexicons with trained silence probabilities

punctuation

、。।,@<>”(),.:;¿?¡!\&%#*~【】,…‥「」『』〝〟″⟨⟩♪・‹›«»~′$+=

Characters to treat as punctuation and strip from around words

clitic_markers

‘’

Characters to treat as clitic markers, will be collapsed to the first character in the string

compound_markers

-

Characters to treat as marker in compound words (i.e., doesnt need to be preserved like for clitics)

quote_markers

“„”〝〟″「」『』‚ʻʿ‘′”

Characters that are used as quotes in the language

word_break_markers

?!(),,.:;¡¿?“„”&~%#—…‥、。【】$+=〝〟″‹›«»・⟨⟩「」『』”

Characters to use in addition to white space when breaking transcripts into words

brackets

([, ]), ({, }), (<, >), ((, )), , (<, >)

Punctuation to keep as bracketing a whole word, i.e., a restart, disfluency, etc