Global Options#

These options are used for aligning the full dataset (and as part of training). Increasing the values of them will allow for more relaxed restrictions on alignment. Relaxing these restrictions can be particularly helpful for certain kinds of files that are quite different from the training dataset (i.e., single word production data from experiments, or longer stretches of audio).

Parameter	Default value	Notes
beam	10	Initial beam width to use for alignment
retry_beam	40	Beam width to use if initial alignment fails
transition_scale	1.0	Multiplier to scale transition costs
acoustic_scale	0.1	Multiplier to scale acoustic costs
self_loop_scale	0.1	Multiplier to scale self loop costs
boost_silence	1.0	1.0 is the value that does not affect probabilities

Feature Configuration#

This section is only relevant for training, as the trained model will contain extractors and feature specification for what they requires.

Parameter	Default value	Notes
feature_type	mfcc	Currently only MFCCs are supported
use_energy	False	Use energy in place of first MFCC
frame_shift	10	In milliseconds, determines time resolution
snip_edges	True	Should provide better time resolution in alignment
use_pitch	False	Flag for whether to compute pitch features
low_frequency	20	Frequency cut off for feature generation
high_frequency	7800	Frequency cut off for feature generation
sample_frequency	16000	Sample rate to up- or down-sample to
allow_downsample	True	Flag for allowing down-sampling
allow_upsample	True	Flag for allowing up-sampling
uses_cmvn	True	Flag for whether to use CMVN
uses_deltas	True	Flag for whether to use delta features
uses_splices	False	Flag for whether to use splices and LDA transformations
splice_left_context	3	Frame width for generating LDA transforms
splice_right_context	3	Frame width for generating LDA transforms
uses_speaker_adaptation	False	Flag for whether to use speaker adaptation
fmllr_update_type	full	Type of fMLLR estimation
silence_weight	0.0	Weight of silence in calculating LDA or fMLLR

Dictionary and text parsing options#

This sections details configuration options related to how MFA normalizes text and performs dictionary look up. Punctuation is stripped from all words, so if a character is part of a language’s orthography, modifying the punctuation parameter to exclude that character would keep that character in the words. See more examples of how these punctuation, clitic_markers, and compound_markers are used in Text normalization and dictionary lookup.

Parameter	Default value	Notes
oov_word	<unk>	Internal word symbol to use for out of vocabulary items
silence_word	<eps>	Internal word symbol to use optional silence
optional_silence_phone	sil	Internal phone symbol to use optional silence in or around utterances
oov_phone	spn	Internal phone symbol to use for out of vocabulary items
position_dependent_phones	True	Flag for whether phones should mark their position in the word as part of the phone symbol internally
num_silence_states	5	Number of states to use for silence phones
num_non_silence_states	3	Number of states to use for non-silence phones
shared_silence_phones	False	Flag for whether to share silence phone models
ignore_case	True	Flag for whether transcriptions should be converted to lower case
silence_probability	0.5	Probability of inserting silence around and within utterances, setting to 0 removes silence modelling
initial_silence_probability	0.5	Probability of starting with silence, setting to 0 removes initial silence
final_silence_correction	None	Correction factor of ending utterances with silence, only relevant for lexicons with trained silence probabilities
final_non_silence_correction	None	Correction factor of ending utterances without silence, only relevant for lexicons with trained silence probabilities
punctuation	、。।，@<>”(),.:;¿?¡!\&%#*~【】，…‥「」『』〝〟″⟨⟩♪・‹›«»～′$+=	Characters to treat as punctuation and strip from around words
clitic_markers	‘’	Characters to treat as clitic markers, will be collapsed to the first character in the string
compound_markers	-	Characters to treat as marker in compound words (i.e., doesnt need to be preserved like for clitics)
quote_markers	“„”〝〟″「」『』‚ʻʿ‘′”	Characters that are used as quotes in the language
word_break_markers	？!()，,.:;¡¿?“„”&~%#—…‥、。【】$+=〝〟″‹›«»・⟨⟩「」『』”	Characters to use in addition to white space when breaking transcripts into words
brackets	([, ]), ({, }), (<, >), ((, )), , (＜, ＞)	Punctuation to keep as bracketing a whole word, i.e., a restart, disfluency, etc