Example: Aligning a demo corpus#
Note
See also our Google Colab notebook for running this example without installing or downloading anything locally. There is also NTT123’s Jupyter notebook for running the alignment example with a custom LibriSpeech dataset, created by NTT123.
Set up#
Important
Ensure you have installed MFA via Installation.
Ensure you have downloaded the pretrained model via
mfa model download acoustic english_mfaEnsure you have downloaded the pretrained US English dictionary via
mfa model download dictionary english_us_mfaDownload the English LibriSpeech demo corpus and extract it to somewhere on your computer
Ensure you have downloaded the pretrained model via
mfa model download acoustic japanese_mfaEnsure you have downloaded the pretrained Japanese dictionary via
mfa model download dictionary japanese_mfaDownload the Japanese JVS demo corpus and extract it to somewhere on your computer
Install Japanese-specific dependencies via
conda install -c conda-forge spacy sudachipy sudachidict-core
Ensure you have downloaded the pretrained model via
mfa model download acoustic mandarin_mfaEnsure you have downloaded the pretrained China Mandarin dictionary via
mfa model download dictionary mandarin_china_mfaDownload the Mandarin THCHS-30 demo corpus and extract it to somewhere on your computer
Install Mandarin-specific dependencies via
pip install spacy-pkuseg dragonmapper hanziconv
Important
This example assumes you have a directory named mfa_data in your home directory in which the demo corpus was extracted.
Alignment#
Aligning using pre-trained models#
In the same environment that you’ve installed MFA, enter the following command into the terminal:
mfa align ~/mfa_data/librispeech-demo-1.0.0 english_us_mfa english_mfa ~/mfa_data/aligned_librispeech_demo --clean
mfa align ~/mfa_data/japanese-jvs-demo-1.0.0 japanese_mfa japanese_mfa ~/mfa_data/aligned_jvs_demo --clean
mfa align ~/mfa_data/mandarin-thchs-30-demo-1.0.0 mandarin_china_mfa mandarin_mfa ~/mfa_data/aligned_thchs_30_demo --clean
Adding words to the dictionary#
First we’ll need the pretrained G2P model. These are installed via the mfa model download command:
mfa model download g2p english_us_mfa
You should be able to run mfa model inspect g2p english_us_mfa and it will output information about the english_us_mfa G2P model.
mfa model download g2p japanese_mfa
You should be able to run mfa model inspect g2p japanese_mfa and it will output information about the japanese_mfa G2P model.
mfa model download g2p mandarin_china_mfa
You should be able to run mfa model inspect g2p mandarin_china_mfa and it will output information about the mandarin_china_mfa G2P model.
Depending on your use case, you might have a list of words to run G2P over, or just a corpus of sound and transcription files. The mfa g2p command can process either:
mfa g2p ~/mfa_data/librispeech-demo-1.0.0 english_us_mfa ~/mfa_data/g2pped_oovs.txt --dictionary_path english_us_mfa --clean
For Japanese, G2P functionality is done as part of alignment by specifying --g2p_model_path.
mfa g2p ~/mfa_data/mandarin-thchs-30-demo-1.0.0 mandarin_china_mfa ~/mfa_data/g2pped_oovs.txt --dictionary_path mandarin_china_mfa --clean
Running the above will output a text file in the format that MFA uses (Pronunciation dictionary format) with all the OOV words (ignoring bracketed words like <cutoff>). I recommend looking over the pronunciations generated and make sure that they look sensible. For languages where the orthography is not transparent, it may be helpful to include --num_pronunciations 3 so that more pronunciations are generated than just the most likely one. For more details on running G2P, see Generate pronunciations for words (mfa g2p).
Once you have looked over the dictionary, you can save the new pronunciations via:
mfa model add_words english_us_mfa ~/mfa_data/g2pped_oovs.txt
For Japanese, G2P functionality is done as part of alignment by specifying --g2p_model_path.
mfa model add_words mandarin_china_mfa ~/mfa_data/g2pped_oovs.txt
The new pronunciations will be available when you use the dictionary identifier in an MFA command, i.e. the modified command from Aligning a speech corpus with existing pronunciation dictionary and acoustic model:
mfa align ~/mfa_data/librispeech-demo-1.0.0 english_us_mfa english_mfa ~/mfa_data/aligned_librispeech_demo_no_oovs --clean
mfa align ~/mfa_data/japanese-jvs-demo-1.0.0 japanese_mfa japanese_mfa ~/mfa_data/aligned_jva_demo_no_oovs --g2p_model_path japanese_mfa --clean
mfa align ~/mfa_data/mandarin-thchs-30-demo-1.0.0 mandarin_china_mfa mandarin_mfa ~/mfa_data/aligned_mandarin_demo_no_oovs --clean
Adapting the acoustic model#
In general, adapting a pretrained acoustic model to your specific data will improve alignments.
We can adapt our pretrained model via the mfa adapt command:
mfa adapt ~/mfa_data/librispeech-demo-1.0.0 english_us_mfa english_mfa ~/mfa_data/english_mfa_adapted.zip --clean
We can now use the adapted model to align the librispeech-demo corpus. Note the change from english_mfa to ~/mfa_data/english_mfa_adapted.zip below.
mfa align ~/mfa_data/librispeech-demo-1.0.0 english_us_mfa ~/mfa_data/english_mfa_adapted.zip ~/mfa_data/aligned_librispeech_demo_adapted --clean
mfa adapt ~/mfa_data/japanese-jvs-demo-1.0.0 japanese_mfa japanese_mfa ~/mfa_data/japanese_mfa_adapted.zip --g2p_model_path japanese_mfa --clean
We can now use the adapted model to align the japanese-jvs-demo corpus. Note the change from japanese_mfa to ~/mfa_data/japanese_mfa_adapted.zip below.
mfa align ~/mfa_data/japanese-jvs-demo-1.0.0 japanese_mfa ~/mfa_data/japanese_mfa_adapted.zip ~/mfa_data/aligned_jvs_demo_adapted --g2p_model_path japanese_mfa --clean
mfa adapt ~/mfa_data/mandarin-thchs-30-demo-1.0.0 mandarin_china_mfa mandarin_mfa ~/mfa_data/mandarin_mfa_adapted.zip --clean
We can now use the adapted model to align the mandarin-thchs-30-demo corpus. Note the change from mandarin_mfa to ~/mfa_data/mandarin_mfa_adapted.zip below.
mfa align ~/mfa_data/mandarin-thchs-30-demo-1.0.0 mandarin_china_mfa ~/mfa_data/mandarin_mfa_adapted.zip ~/mfa_data/aligned_thchs_30_demo_adapted --clean