Pretrained models

Pretrained acoustic models

As part of using the Montreal Forced Aligner in our own research, we have trained acoustic models for a number of languages. If you would like to use them, please download them below. Please note the dictionary that they were trained with to see more information about the phone set. When using these with a pronunciation dictionary, the phone sets must be compatible. If the orthography of the language is transparent, it is likely that we have a G2P model that can be used to generate the necessary pronunciation dictionary.

Language

Link

Corpus

Phone set

Arabic

Not available yet

GlobalPhone

GlobalPhone

Bulgarian

Bulgarian acoustic model

GlobalPhone

GlobalPhone

Croatian

Croatian acoustic model

GlobalPhone

GlobalPhone

Czech

Czech acoustic model

GlobalPhone

GlobalPhone

English

English acoustic model

LibriSpeech

Arpabet

French (FR)

French (FR) acoustic model

GlobalPhone

GlobalPhone

French (FR)

French (Prosodylab) acoustic model

GlobalPhone

Prosodylab [1]

French (QC)

French (QC) acoustic model

Lab speech

Prosodylab [1]

German

German acoustic model

GlobalPhone

GlobalPhone

German

German (Prosodylab) acoustic model

GlobalPhone

Prosodylab [3]

Hausa

Hausa acoustic model

GlobalPhone

GlobalPhone

Japanese

Not available yet

GlobalPhone

GlobalPhone

Korean

Korean acoustic model

GlobalPhone

GlobalPhone

Mandarin

Mandarin acoustic model

GlobalPhone

GlobalPhone

Polish

Polish acoustic model

GlobalPhone

GlobalPhone

Portuguese

Portuguese acoustic model

GlobalPhone

GlobalPhone

Russian

Russian acoustic model

GlobalPhone

GlobalPhone

Swahili

Swahili acoustic model

GlobalPhone

GlobalPhone

Swedish

Swedish acoustic model

GlobalPhone

GlobalPhone

Tamil

Not available yet

GlobalPhone

GlobalPhone

Thai

Thai acoustic model

GlobalPhone

GlobalPhone

Turkish

Turkish acoustic model

GlobalPhone

GlobalPhone

Ukrainian

Ukrainian acoustic model

GlobalPhone

GlobalPhone

Vietnamese

Vietnamese acoustic model

GlobalPhone

GlobalPhone

Wu

Not available yet

GlobalPhone

GlobalPhone

Pretrained G2P models

Included with MFA is a separate tool to generate a dictionary from a preexisting model. This should be used if you’re aligning a dataset for which you have no pronunciation dictionary or the orthography is very transparent. We have pretrained models for several languages, which can be downloaded below. These models were generated using Phonetisaurus Phonetisaurus and the GlobalPhone dataset. This means that they will only work for transcriptions which use the same alphabet. Current language options are: Arabic, Bulgarian, Mandarin, Czech, Polish, Russian, Swahili, Ukrainian, and Vietnamese, with the following accuracies when trained on 90% of the data and tested on 10%:

Language

Link

Accuracy

Orthography system

Phone set

Arabic

Arabic G2P model

95.4

Romanized [1]

GlobalPhone

Bulgarian

Bulgarian G2P model

97.3

Cyrillic alphabet

GlobalPhone

Croatian

Croatian G2P model

92.7

Latin alphabet

GlobalPhone

Czech

Czech G2P model

96.8

Latin alphabet

GlobalPhone

French

French G2P model

93.2

Latin alphabet

GlobalPhone

French

French (Prosodylab) G2P model [1]

95.2

Latin alphabet

Prosodylab

German

German G2P model

67.0

Latin alphabet

GlobalPhone

German

German (Prosodylab) G2P model [3]

94.1

Latin alphabet

Prosodylab

Hausa

Hausa G2P model

70.1

Latin alphabet

GlobalPhone

Japanese

Japanese G2P model

82.1

Romanized

GlobalPhone

Korean

Korean G2P model

89.5

Hangul

GlobalPhone

Mandarin

Mandarin Pinyin G2P model

99.9

Pinyin

Pinyin phones

Mandarin

Mandarin Character G2P model [4]

83.2

Hanzi

Pinyin phones

Polish

Polish G2P model

98.8

Latin alphabet

GlobalPhone

Portuguese

Portuguese G2P model

86.5

Latin alphabet

GlobalPhone

Russian

Russian G2P model

96.4

Cyrillic alphabet

GlobalPhone

Spanish

Spanish G2P model

94.0

Latin alphabet

GlobalPhone

Swahili

Swahili G2P model

99.9

Latin alphabet

GlobalPhone

Swedish

Swedish G2P model

83.3

Latin alphabet

GlobalPhone

Thai

Thai G2P model

71.7

Thai script

GlobalPhone

Turkish

Turkish G2P model

83.3

Latin alphabet

GlobalPhone

Ukrainian

Ukrainian G2P model

98.0

Cyrillic alphabet

GlobalPhone

Vietnamese

Vietnamese G2P model

98.2

Vietnamese alphabet

GlobalPhone

Wu

Wu G2P model [5]

77.5

Hanzi

GlobalPhone