.. _train_acoustic_model:

Train a new acoustic model ``(mfa train)``
******************************************

You can train new :term:`acoustic models` from scratch using MFA, and export the final alignments as :term:`TextGrids` at the end.  You don't need a ton of data to generate decent alignments (see `the blog post comparing alignments trained on various corpus sizes <https://memcauliffe.com/how-much-data-do-you-need-for-a-good-mfa-alignment.html>`_).  At the end of the day, it comes down to trial and error, so I would recommend trying different workflows of pretrained models vs training your own or adapting a model to your data to see what performs best.

Phone topology
==============

The phone topology that MFA uses is different from the standard 3-state HMMs.  Each phone can have a maximum of 5 states, but allows for early exiting, so each phone has a minimum duration of 10ms (one MFCC frame) rather than 30ms for the 3-state HMM (three MFCC frames).

.. seealso::::

   See :doc:`phone groups <../concepts/hmm>` for more information on HMMs and phone typologies.

Phone groups
============

By default each phone is treated independently of one another, which can lead to data sparsity issues or worse contextual modeling for clearly related phones when modeling triphones (i.e., long/short vowels :ipa_inline:`ɑ/ɑː`, stressed/unstressed versions :ipa_inline:`OY1/OY2/OY0`). Phone groups can be specified via the :code:`--phone_groups_path` flag. See :doc:`phone groups <../implementations/phone_groups>` for more information.


.. deprecated:: 3.0.0

   Using the :code:`--phone_set` flag to generate phone groups is deprecated as of MFA 3.0, please refer to using :code:`--phone_groups_path` flag to specify a phone groups configuration file instead.

Pronunciation modeling
======================

For the default configuration, pronunciation probabilities are estimated following the second and third SAT blocks.  See :ref:`training_dictionary` for more details.

A recent experimental feature for training acoustic models is the ``--train_g2p`` flag which changes the pronunciation probability estimation from a lexicon based estimation to instead using a G2P model as the lexicon. The idea here is that we have pronunciations generated by the initial blocks much like for the standard lexicon-based approach, but instead of estimating probabilities for individual word/pronunciation pairs and the likelihood of surrounding silence, it learns a mapping between the graphemes of the input texts and the phones.

.. note::

   See :doc:`phonological rules <../implementations/phonological_rules>` for how to specify regular expression-like phonological rules so you don't have to code every form for a regular rule.


Language tokenization
=====================

By specifying a language via the :code:`--language` flag, tokenization will occur as part of text normalization.  This functionality is primarily useful for languages that do not rely on spaces to delimit words like Japanese, Thai, or Chinese languages.  If you're also using :code:`--g2p_model_path` to generate pronunciations during training, note that the language setting will require G2P models trained on specific orthographies (i.e., using :code:`mfa model download g2p korean_jamo_mfa` instead of :code:`mfa model download g2p korean_mfa`).


.. csv-table::
   :header: "Language", "Pronunciation orthography", "Input", "Output", "Dependencies", "G2P model"

   "Japanese", "Katakana", "これは日本語です", "コレ ワ ニホンゴ デス", ":xref:`sudachipy`", "`Katakana G2P <https://mfa-models.readthedocs.io/en/latest/g2p/Japanese/Japanese%20%28Katakana%29%20MFA%20G2P%20model%20v3_0_0.html>`_"
   "Korean", "Jamo", "이건 한국어야", "이건 한국어 야", ":xref:`python-mecab-ko`, :xref:`jamo`", "`Jamo G2P <https://mfa-models.readthedocs.io/en/latest/g2p/Korean/Korean%20%28Jamo%29%20MFA%20G2P%20model%20v3_0_0.html>`_"
   "Chinese", "Pinyin", "这是中文", "zhèshì zhōngwén", ":xref:`spacy-pkuseg`, :xref:`hanziconv`, :xref:`dragonmapper`", "`Pinyin G2P <https://mfa-models.readthedocs.io/en/latest/g2p/Mandarin/Mandarin%20%28China%20Pinyin%29%20MFA%20G2P%20model%20v3_0_0.html>`_"
   "Thai", "Thai script", "นี่คือภาษาไทย", "นี่ คือ ภาษาไทย", ":xref:`pythainlp`", "`Thai G2P <https://mfa-models.readthedocs.io/en/latest/g2p/Thai/Thai%20MFA%20G2P%20model%20v3_0_0.html>`_"

Command reference
=================


.. click:: montreal_forced_aligner.command_line.train_acoustic_model:train_acoustic_model_cli
   :prog: mfa train
   :nested: full

Configuration reference
=======================

- :ref:`configuration_acoustic_modeling`

API reference
-------------

- :ref:`acoustic_modeling_api`

  - :ref:`acoustic_model_training_api`