Configuration#

MFA root directory#

MFA uses a temporary directory for commands that can be specified in running commands with --temp_directory (see below), and it also uses a directory to store global configuration settings and saved models. By default this root directory is ~/Documents/MFA, but if you would like to put this somewhere else, you can set the environment variable MFA_ROOT_DIR to use that. MFA will raise an error on load if it’s unable to write to the root directory.

Global configuration#

Global configuration for MFA can be updated via the mfa configure subcommand. Once the command is called with a flag, it will set a default value for any future runs (though, you can overwrite most settings when you call other commands).

mfa configure#

The configure command is used to set global defaults for MFA so you don’t have to set them every time you call an MFA command.

mfa configure [OPTIONS]

Options

-p, --profile <profile>#: Configuration profile to use, defaults to “global”

-t, --temporary_directory <temporary_directory>#: Set the default temporary directory.Currently defaults to /home/docs/Documents/MFA

-j, --num_jobs <num_jobs>#: Set the number of processes to use by default. Currently defaults to 3

--always_clean, --never_clean#: Turn on/off clean mode where MFA will clean temporary files before each run. Currently defaults to False.

--always_verbose, --never_verbose#: Turn on/off verbose mode where MFA will print more output. Currently defaults to False.

--always_quiet, --never_quiet#: Turn on/off quiet mode where MFA will not print any output. Currently defaults to False.

--always_debug, --never_debug#: Turn on/off extra debugging functionality. Currently defaults to False.

--always_overwrite, --never_overwrite#: Turn on/off overwriting export files. Currently defaults to False.

--enable_mp, --disable_mp#: Turn on/off multiprocessing. Multiprocessing is recommended will allow for faster executions. Currently defaults to True.

--enable_textgrid_cleanup, --disable_textgrid_cleanup#: Turn on/off post-processing of TextGrids that cleans up silences and recombines compound words and clitics. Currently defaults to True.

--enable_auto_server, --disable_auto_server#: If auto_server is enabled, MFA will start a server at the beginning of a command and close it at the end. If turned off, use the mfa server commands to initialize, start, and stop a profile’s server. Currently defaults to True.

--enable_use_postgres, --disable_use_postgres#: If use_postgres is enabled, MFA will use PostgreSQL as the database backend instead of sqlite. Currently defaults to False.

--blas_num_threads <blas_num_threads>#: Number of threads to use for BLAS libraries, 1 is recommended due to how much MFA relies on multiprocessing. Currently defaults to 1.

--github_token <github_token>#: Github token to use for model downloading.

--bytes_limit <bytes_limit>#: Bytes limit for Joblib Memory caching on disk.

--seed <seed>#: Random seed to set for various pseudorandom processes.

-h, --help#: Show this message and exit.

Configuring specific commands#

MFA has the ability to customize various parameters that control aspects of data processing and workflows. These can be supplied via the command line like:

mfa align ... --beam 1000

The above command will set the beam width used in aligning to 1000 (and the retry beam width to 4000). This command is the equivalent of supplying a config file like the below via the --config_path:

beam: 1000

Supplying the above via:

mfa align ... --config_path config_above.yaml

will also set the beam width to 1000 and retry beam width to 4000 as well.

For simple settings, the command line argument approach can be good, but for more complex settings, the config yaml approach will allow you to specify things like aspects of training blocks or punctuation:

beam: 100
retry_beam: 400

punctuation: ":,."

training:
  - monophone:
      num_iterations: 20
      max_gaussians: 500
      subset: 1000
      boost_silence: 1.25

  - triphone:
      num_iterations: 35
      num_leaves: 2000
      max_gaussians: 10000
      cluster_threshold: -1
      subset: 5000
      boost_silence: 1.25
      power: 0.25

You can then also override these options on the command like, i.e. --beam 10 --config_path config_above.yaml would reset the beam width to 10. Command line specified arguments always have higher priority over the parameters derived from a configuration yaml.