Examples

Example 1: Aligning LibriSpeech (English)

Note

There is also a Google Colab notebook for running the alignment example with a custom Librispeech dataset, created by NTT123.

Set up

  1. Ensure you have installed MFA via Installation.
  2. Ensure you have downloaded the pretrained model via mfa download acoustic english
  3. Download the prepared LibriSpeech dataset (LibriSpeech data set) and extract it somewhere on your computer
  4. Download the LibriSpeech lexicon (LibriSpeech lexicon) and save it somewhere on your computer

Alignment

Aligning using pre-trained models

In the same environment that you’ve installed MFA, enter the following command into the terminal:

mfa align /path/to/librispeech/dataset /path/to/librispeech/lexicon.txt english ~/Documents/aligned_librispeech

Aligning through training

In the same environment that you’ve installed MFA, enter the following command into the terminal:

mfa train  /path/to/librispeech/dataset /path/to/librispeech/lexicon.txt ~/Documents/aligned_librispeech

Example 2: Generate Mandarin dictionary

Set up

  1. Ensure you have installed MFA via Installation.
  2. Ensure you have downloaded the pretrained model via mfa download g2p mandarin_pinyin_g2p
  3. Download the prepared Mandarin dataset from (example Mandarin corpus) and extract it somewhere on your computer

Note

The example Mandarin corpus is .lab files from the THCHS-30 corpus.

To generate a new dictionary for this “corpus” from the pretrained G2P model, run the following:

mfa g2p mandarin_pinyin_g2p /path/to/mandarin/dataset /path/to/save/mandarin_dict.txt

This should take no more than a few seconds. Open the output file, and check that all the words are there. The accuracy of the transcription should be near 100%. You can now use this to align your mini corpus:

mfa train /path/to/mandarin/dataset /path/to/save/mandarin_dict.txt /path/to/save/output

Since there are very few files (i.e. small training set), the alignment will be suboptimal. This example is intended more to give a sense of the pipeline for generating a dictionary and using it for alignment.

Example 3: Train Mandarin G2P model

Set up

  1. Ensure you have installed MFA via Installation.
  2. Download the prepared Mandarin dictionary from (example Mandarin dictionary)

In the same environment that you’ve installed MFA, enter the following command into the terminal:

mfa train_g2p /path/to/mandarin_dict.txt mandarin_test_model.zip

This should take no more than a few seconds, and should produce a model which could be used for Generating a dictionary.

Note

Because there is so little data in mandarin_dict.txt, the model produced will not be very accurate, and so any dictionary generated from it will also be inaccurate. This dictionary is provided for illustrative purposes only.