.. _workflows_index:

Workflows available
===================

The primary workflow in MFA is forced alignment, where text is aligned to speech along with phones derived from a pronunciation dictionary and an acoustic model. There are, however, other workflows for transcribing speech using speech-to-text functionality in Kaldi, pronunciation dictionary creation using Pynini, and some basic corpus creation utilities like VAD-based segmentation. Additionally, acoustic models, G2P models, and language models can be trained from your own data (and then used in alignment and other workflows).

.. warning::

   Speech-to-text functionality is pretty basic, and the model architecture used in MFA is older GMM-HMM and NGram models, so using something like :xref:`coqui` or Kaldi's ``nnet`` functionality will likely yield better quality transcriptions.

.. hint::

   See :ref:`pretrained_models` for details about commands to inspect, download, and save various pretrained MFA models.

.. toctree::
   :hidden:

   alignment
   adapt_acoustic_model
   train_acoustic_model
   dictionary_generating
   g2p_train