Database#

MFA uses a SQLite database to cache information during training/alignment runs. An issue with training larger corpora was running into memory bottlenecks as all the information in the corpus was stored in memory, and fMLLR estimations in later stages would crash. Additionally, there was always a trade off between storing results for use in other applications like Anchor Annotator or providing diagnostic information to users, and ensuring that the core MFA workflows were as memory/time efficient as possible. Offloading to a database frees up some memory, and makes some computations more efficient, and should be optimized enough to not slow down regular processing.

Dictionary(**kwargs)

Database class for storing information about a pronunciation dictionary

Dialect(**kwargs)

Database class for storing information about a dialect

Word(**kwargs)

Database class for storing words, their integer IDs, and pronunciation information

Pronunciation(**kwargs)

Database class for storing information about a pronunciation

Phone(**kwargs)

Database class for storing phones and their integer IDs

Grapheme(**kwargs)

Database class for storing phones and their integer IDs

File(**kwargs)

Database class for storing information about files in the corpus

TextFile(**kwargs)

Database class for storing information about transcription files

SoundFile(**kwargs)

Database class for storing information about sound files

Speaker(**kwargs)

Database class for storing information about speakers

Utterance(**kwargs)

Database class for storing information about utterances

WordInterval(**kwargs)

Database class for storing information about aligned word intervals

PhoneInterval(**kwargs)

Database class for storing information about aligned phone intervals

CorpusWorkflow(**kwargs)

Database class for storing information about a particular workflow (alignment, transcription, etc)

Job(**kwargs)

Database class for storing information about multiprocessing jobs

M2MSymbol(**kwargs)

Database class for storing information many to many G2P training information

M2M2Job(**kwargs)

Mapping class between M2MSymbol and Job

Word2Job(**kwargs)

Mapping class between Word and Job