2.2 Changelog#

2.2.16#

Fixed a crash when using the --speaker_characters flag with TextGrid files
Fixed a crash when using --num_pronunciations flag before required arguments
Fixed an issue where the aligner would use speaker adaptation even if it was explicitly disabled

Fixed a crash when using fine tuned boundaries
Pinned scikit-learn to versions less than 1.3, due to it breaking hdbscan package

Re-established support for sqlite for most aspects of MFA (some functionality requires using PostgreSQL)
Added a configuration flag for mfa configure --enable_use_postgres and mfa [command] ... --use_postgres to use PostgreSQL as the database backend
Fixed a bug where adapted acoustic models would not contain all the necessary metadata to be used

Make socket updating more general
Remove false “no alignments” warning in alignment iterations while training
Fixed a bug in adding words to a dictionary
Fixed a bug where words marked as “<cutoff>” were being treated as “[bracketed]”
Silences DatabaseError while cleaning up MFA
Fix a crash with in fine tuning

Fixed a bug in pronunciation probability training that was causing all probabilities of following silence to be 0
Fixed a bug where only words in the corpus were being exported from the lexicon

Fixed a bug introduced in 2.2.4 that made segments overlap with silence intervals when using textgrid cleanup
Changed databases to always use the root MFA rather than rely on temporary directories to make it more consistent where database files and sockets will get placed. This root directory can be changed via the environment variable MFA_ROOT_DIR
Optimized training graph and collecting alignments after changes to how unknown words were represented internally
Changed feature generation to use piped audio loaded via PySoundFile rather than via calls to sox/ffmpeg directly

Fixed a bug in where piping to mfa g2p could hang if there was no known characters
Fixed a bug where the default auto_server setting was to be disabled rather than enabled
Removed indices on utterance text to address issues in longer files
Revamped docker image to address issues with initializing postgresql databases as root

Fixes an issue where some directories in Common Voice Japanese were causing FileNotFound errors for sound files
Changes PostgreSQL database connections to use socket directories rather than ports
Added the ability to manage MFA database servers (MFA Servers), along with the configuration flag to disable automatic starting/stopping of databases
Disabled starting servers for subcommands like configure, version, history or --help invocations
Added support for handling spaces when running mfa g2p (though very simple as it just concatenates the outputs, and if --num_pronunciations is set to something other than 1, it is ignored)
Added the ability to pipe words via stdin/stdout when running mfa g2p
Added the ability to generate pronunciations per utterance when running mfa g2p
Added a first pass at providing estimations of alignment quality through the alignment_analysis.csv file exported with alignments, see Analyzing alignment quality for more details.

Update terminal printing to use rich rather than custom logic
Prevented the tokenizer utility from processing of text files that don’t have a corresponding sound file

Fixed a rounding issue in parsing sox output for sound file duration
Added --dictionary_path option to Generate pronunciations for words (mfa g2p) to allow for generating pronunciations for just those words that are missing in a dictionary
Added add_words subcommand to Pretrained models to allow for easy adding of words and pronunciations from Generate pronunciations for words (mfa g2p) to pronunciation dictionaries