What’s new in 2.0#

Version 2.0 of the Montreal Forced Aligner represents several overhauls to installation and management of commands. See 2.0 Changelog for a more specific changes.

Installation style#

Up until now, MFA has used a frozen executable model for releases, which involves packaging MFA code along with a Python interpreter, some system libraries, and compiled third party executables from Kaldi, OpenFST, OpenNgram, and Phonetisaurus. The main issues with this style of distribution revolve around inefficiencies in the build system and a lack of ability to customize the runtime for different environments and versions.

Moving forward, MFA will:

Use standard Python packaging and be available for import in Python
Rely on Conda Forge for handling dependencies
Switch to using Pynini instead of Phonetisaurus for G2P purposes, which should ease distribution and installation
Have a Unified command line interface with subcommands for each command line function that will be available upon installation, as well as exposing the full MFA api for use in other Python scripts
Allow for faster bug fixes that do not require repackaging and releasing frozen binaries across all platforms

Unified command line interface#

Previously, MFA has used multiple separate frozen CLI programs to perform specific actions. However, as more functionality has been added with G2P models, validation, managing pretrained models, and training different types of models, it has become unwieldy to have separate commands for each. As such, going forward:

There will be a single mfa command line utility that will be available once it is installed via pip/conda.
Running mfa -h will list the subcommands that can be run, along with their descriptions, see All commands for details.

Anchor annotator GUI#

Added a basic annotation GUI with features for:

Listing processed utterances in the corpus with the ability to see which utterances have words not found in your pronunciation dictionary
Allowing for audio playback of utterances and modification of utterance text
Listing entries in an imported pronunciation dictionary
Updating/adding dictionary entries
Updating transcriptions

See also Anchor annotator (mfa anchor) for more information on using the annotation GUI.

Transcription#

MFA now supports:

Transcribing a corpus of sound files using an acoustic model, dictionary, and language model, see Transcribe audio files (mfa transcribe) for more information.
Training language models from corpora that have text transcriptions, see Train a new language model (mfa train_lm) for more information
Training pronunciation probability dictionaries from alignments, for use in alignment or transcription, see Add probabilities to a dictionary (mfa train_dictionary) for more information