1.X Changelog#

1.1.0#

Major changes to system, see What’s new in 1.1.

Added Grapheme-to-Phoneme capabilities
Acoustic models no longer contain the dictionary they were trained with
Dictionaries must be specified when aligning using pretrained models
The aligner now automatically cleans the temporary directory when the previous run failed
Added validation for types of command line arguments
Catch and list files that could not be read using UTF-8
Update Kaldi version to 5.1 and OpenFST version to 1.6.2 on Mac and Linux
Add support for specifying custom non-speech annotations in pronunciation dictionary with sil and spn
Made command line flags more consistent in spelling
Made pretrained models for many languages available

Fixed an issue where aligning using pretrained models was improperly updating the original model with sparser data
Added a flag to turn off speaker adaptation when aligning using a pretrained model
Optimized training graph generation when aligning using a pretrained model

Added warning messages and log output when wav files are ignored because they have too low of a sampling rate or no .lab or .TextGrid file associated with them

Fixed an issue where speaker character flags were being ignored when parsing TextGrid files

Fixed an issue where the number of gaussians was set too low for triphone training

Fixed an issue with unicode characters not being correctly parsed when using --nodict
Fixed an issue where short intervals in TextGrid were not being properly ignored
Added a command line option --temp_directory to allow for user specification of the temporary directory that MFA stores all files during alignment, with the default of ~/Documents/MFA
Added logging directory and some logging for when utterances are ignored

Improved memory and time efficiency of extracting channels from stereo files, particularly for long sound files

Fixed an issue where pretrained models were not being bundled with the source code

Fixed an issue with Linux binaries not finding Kaldi binaries
English models now use all of LibriSpeech dataset and not just clean subset (increased number of accents being the primary difference between the two)

Added commandline argument --clean to remove temporary files
Added support for multiple sampling rates in a single dataset
Fix some bugs relating to using a single process
Fixed a bug where spaces were being inserted into transcriptions when using --nodict
Fixed a bug where having no out-of-vocabulary items would cause a crash at the end of aligning
Fixed a bug where the frozen executable could not find the included pretrained models
Fixed an issue where dictionaries in model outputs were binary files rather than editable text files
Added docstrings to main classes
Updated built in model english for the full 1000-hour LibriSpeech corpus