Database#

MFA uses a SQLite database to cache information during training/alignment runs. An issue with training larger corpora was running into memory bottlenecks as all the information in the corpus was stored in memory, and fMLLR estimations in later stages would crash. Additionally, there was always a trade off between storing results for use in other applications like Anchor Annotator or providing diagnostic information to users, and ensuring that the core MFA workflows were as memory/time efficient as possible. Offloading to a database frees up some memory, and makes some computations more efficient, and should be optimized enough to not slow down regular processing.

`Dictionary`(**kwargs)	Database class for storing information about a pronunciation dictionary
`Dialect`(**kwargs)	Database class for storing information about a dialect
`Word`(**kwargs)	Database class for storing words, their integer IDs, and pronunciation information
`Pronunciation`(**kwargs)	Database class for storing information about a pronunciation
`Phone`(**kwargs)	Database class for storing phones and their integer IDs
`Grapheme`(**kwargs)	Database class for storing phones and their integer IDs
`File`(**kwargs)	Database class for storing information about files in the corpus
`TextFile`(**kwargs)	Database class for storing information about transcription files
`SoundFile`(**kwargs)	Database class for storing information about sound files
`Speaker`(**kwargs)	Database class for storing information about speakers
`Utterance`(**kwargs)	Database class for storing information about utterances
`WordInterval`(**kwargs)	Database class for storing information about aligned word intervals
`PhoneInterval`(**kwargs)	Database class for storing information about aligned phone intervals
`CorpusWorkflow`(**kwargs)	Database class for storing information about a particular workflow (alignment, transcription, etc)
`PhonologicalRule`(**kwargs)	Database class for storing information about a phonological rule :param id: Primary key :type id: int :param segment: Segment to replace :type segment: str :param preceding_context: Context before segment to match :type preceding_context: str :param following_context: Context after segment to match :type following_context: str :param replacement: Replacement of segment :type replacement: str :param probability: Probability of the rule application :type probability: float :param silence_after_probability: Probability of silence following forms with rule application :type silence_after_probability: float :param silence_before_correction: Correction factor for silence before forms with rule application :type silence_before_correction: float :param non_silence_before_correction: Correction factor for non-silence before forms with rule application :type non_silence_before_correction: float :param pronunciations: List of rule applications :type pronunciations: list[`RuleApplication`]
`RuleApplication`(**kwargs)	Database class for mapping rules to generated pronunciations :param pronunciation_id: Foreign key to `Pronunciation` :type pronunciation_id: int :param rule_id: Foreign key to `PhonologicalRule` :type rule_id: int :param pronunciation: Pronunciation :type pronunciation: `Pronunciation` :param rule: Rule applied :type rule: `PhonologicalRule`
`Job`(**kwargs)	Database class for storing information about multiprocessing jobs
`M2MSymbol`(**kwargs)	Database class for storing information many to many G2P training information
`M2M2Job`(**kwargs)	Mapping class between `M2MSymbol` and `Job`
`Word2Job`(**kwargs)	Mapping class between `Word` and `Job`