Examples#
Example 1: Aligning LibriSpeech (English)#
Note
There is also a Google Colab notebook for running the alignment example with a custom Librispeech dataset, created by NTT123.
Set up#
Ensure you have installed MFA via Installation.
Ensure you have downloaded the pretrained model via
mfa model download acoustic english_mfa
Ensure you have downloaded the pretrained US english dictionary via
mfa model download dictionary english_us_mfa
Download the prepared LibriSpeech dataset (LibriSpeech data set) and extract it somewhere on your computer
Alignment#
Aligning using pre-trained models#
In the same environment that you’ve installed MFA, enter the following command into the terminal:
mfa align /path/to/librispeech/dataset english_us_ma english_mfa ~/Documents/aligned_librispeech
Aligning through training#
In the same environment that you’ve installed MFA, enter the following command into the terminal:
mfa train /path/to/librispeech/dataset /path/to/librispeech/lexicon.txt ~/Documents/aligned_librispeech
Example 2: Generate Mandarin dictionary#
Set up#
Ensure you have installed MFA via Installation.
Ensure you have downloaded the pretrained model via
mfa model download g2p mandarin_pinyin_g2p
Download the prepared Mandarin dataset from (example Mandarin corpus) and extract it somewhere on your computer
Note
The example Mandarin corpus is .lab files from the THCHS-30 corpus.
To generate a new dictionary for this “corpus” from the pretrained G2P model, run the following:
mfa g2p mandarin_pinyin_g2p /path/to/mandarin/dataset /path/to/save/mandarin_dict.txt
This should take no more than a few seconds. Open the output file, and check that all the words are there. The accuracy of the transcription should be near 100%. You can now use this to align your mini corpus:
mfa train /path/to/mandarin/dataset /path/to/save/mandarin_dict.txt /path/to/save/output
Since there are very few files (i.e. small training set), the alignment will be suboptimal. This example is intended more to give a sense of the pipeline for generating a dictionary and using it for alignment.
Example 3: Train Mandarin G2P model#
Set up#
Ensure you have installed MFA via Installation.
Download the prepared Mandarin dictionary from (example Mandarin dictionary)
In the same environment that you’ve installed MFA, enter the following command into the terminal:
mfa train_g2p /path/to/mandarin_dict.txt mandarin_test_model.zip
This should take no more than a few seconds, and should produce a model which could be used for Generate pronunciations for words (mfa g2p).
Note
Because there is so little data in mandarin_dict.txt
, the model produced will not be very accurate, and so any
dictionary generated from it will also be inaccurate. This dictionary is provided for illustrative purposes only.