News

What’s new in 2.0

Version 2.0 of the Montreal Forced Aligner represents several overhauls to installation and management of commands.

Installation style

Up until now, MFA has used a frozen executable model for releases, which involves packaging MFA code along with a Python interpreter, some system libraries, and compiled third party executables from Kaldi, OpenFST, OpenNgram, and Phonetisaurus. The main issues with this style of distribution revolve around inefficiencies in the build system and a lack of ability to customize the runtime for different environments and versions.

Moving forward, MFA will:

  • Use standard Python packaging, i.e., pip install montreal-forced-aligner or python setup.py install from the cloned repo
  • Allow for downloading third party executables for the particular system, but also allow for picking up relevant executables that were built on the system, increasing flexibility of use
  • Switch to using Pynini instead of Phonetisaurus for G2P purposes, which should ease distribution and installation
  • Have a Unified command line interface with subcommands for each command line function that will be available upon installation, as well as exposing the full MFA api for use in other Python scripts
  • Allow for faster bug fixes that do not require repackaging and releasing frozen binaries across all platforms

Unified command line interface

Previously, MFA has used multiple separate frozen CLI programs to perform specific actions. However, as more functionality has been added with G2P models, validation, managing pretrained models, and training different types of models, it has become unwieldy to have separate commands for each. As such, going forward:

  • There will be a single mfa command line utility that will be available once it is installed via pip.
  • Running mfa -h will list the subcommands that can be run, along with their descriptions.

Annotator GUI

Added a basic annotation GUI with features for:

  • Listing processed utterances in the corpus with the ability to see which utterances have words not found in your pronunciation dictionary
  • Allowing for audio playback of utterances and modification of utterance text
  • Listing entries in an imported pronunciation dictionary
  • Updating/adding dictionary entries
  • Updating transcriptions

See also Annotator for more information on using the annotation GUI.

Transcription

MFA now supports:

  • Transcribing a corpus of sound files using an acoustic model, dictionary, and language model, see Running the transcriber for more information.
  • Training language models from corpora that have text transcriptions, see Training language models for more information
  • Training pronunciation probability dictionaries from alignments, for use in alignment or transcription, see Modeling pronunciation probabilities for more information

What’s new in 1.1

Version 1.1 of the Montreal Forced Aligner represents several overhauls to the workflow and ability to customize model training and alignment.

Attention

Please note that development of 1.1 has been bundled into 2.0 as part of larger infrastructure changes in developing MFA (@mmcauliffe no longer being affiliated with an academic institution, lack of access to Mac OS for building third party executables, etc)

Training configurations

A major new feature is the ability to specify and customize configuration for training and alignment. Prior to 1.1, the training procedure for new models was:

  • Monophone training
  • Triphone training
  • Speaker-adapted triphone training (could be disabled)

The parameters for each of these training blocks were fixed and not changeable.

In 1.1, the following training procedures are available:

  • Monophone training
  • Triphone training
  • LDA+MLLT training
  • Speaker-adapted triphone training
  • Ivector extractor training
  • Nnet2 training

Each of these blocks (as well as their inclusion) can be customized through a YAML config file. In addition to training parameters, global alignment and feature configuration parameters are available. See Configuration for more details.

Data validation

In version 1.0, data validation was done as part of alignment, with user input whether alignment should be stopped if problems were detected. In version 1.1, all data validation is done through a separate executable mfa_validate_dataset (see Validating data for more details on usage). Validating the dataset consists of:

  • Checking for out of vocabulary items between the dictionary and the corpus
  • Checking for read errors in transcription files
  • Checking for transcriptions without sound files and sound files without transcriptions
  • Checking for issues in feature generation (can be disabled for speed)
  • Checking for issues in aligning a simple monophone model (can be disabled for speed)
  • Checking for transcription errors using a simple unigram language model of common words and words in the transcript (disabled by default)

The user should now run mfa_validate_dataset first and fix any issues that they perceive as important. The alignment executables will print a warning if any of these issues are present, but will perform alignment without prompting for user input.

Updated dictionary generation

The functionality of mfa_generate_dictionary has been expanded.

  • Rather than having a --no_dict option for alignment executables, the orthographic transcription functionality is now used when a G2P model is not provided to mfa_generate_dictionary
  • When a corpus directory is specified as the input path, all words will be parsed rather than just those from transcription files with an associated sound file
  • When a text file is specified as the input path, all words in the text file will be run through G2P, allowing for a simpler pipeline for generating transcriptions from out of vocabulary items