Example: Aligning a demo corpus#

Note

See also our Google Colab notebook for running this example without installing or downloading anything locally. There is also NTT123’s Jupyter notebook for running the alignment example with a custom LibriSpeech dataset, created by NTT123.

Set up#

Important

Ensure you have installed MFA via Installation.

English

Ensure you have downloaded the pretrained model via mfa model download acoustic english_mfa
Ensure you have downloaded the pretrained US English dictionary via mfa model download dictionary english_us_mfa
Download the English LibriSpeech demo corpus and extract it to somewhere on your computer

Japanese

Ensure you have downloaded the pretrained model via mfa model download acoustic japanese_mfa
Ensure you have downloaded the pretrained Japanese dictionary via mfa model download dictionary japanese_mfa
Download the Japanese JVS demo corpus and extract it to somewhere on your computer
Install Japanese-specific dependencies via conda install -c conda-forge spacy sudachipy sudachidict-core

Mandarin

Ensure you have downloaded the pretrained model via mfa model download acoustic mandarin_mfa
Ensure you have downloaded the pretrained China Mandarin dictionary via mfa model download dictionary mandarin_china_mfa
Download the Mandarin THCHS-30 demo corpus and extract it to somewhere on your computer
Install Mandarin-specific dependencies via pip install spacy-pkuseg dragonmapper hanziconv

Important

This example assumes you have a directory named mfa_data in your home directory in which the demo corpus was extracted.

Example: Aligning a demo corpus#

Set up#

Alignment#

Aligning using pre-trained models#

Adding words to the dictionary#

Adapting the acoustic model#

This Page