API Reference

Below is a diagram of the main classes used in MFA:

Aligner API

There are two main Aligner classes, one for using a pretrained model and one for training a model while aligning. A class diagram of the Aligner API can be found below:

PretrainedAligner(corpus, dictionary, …[, …]) Class for aligning a dataset using a pretrained acoustic model
TrainableAligner(corpus, dictionary, …[, …]) Aligner that aligns and trains acoustics models on a large dataset

Corpus API

The Corpus class contains information about how a dataset is structured. A class diagram of the Corpus API can be found below:

AlignableCorpus(directory, output_directory) Class that stores information about the dataset to align.

Dictionary API

The Dictionary class contains pronunciation and orthographic information. A class diagram of the Dictionary API can be found below:

Dictionary(input_path, output_directory[, …]) Class containing information about a pronunciation dictionary

Model API

Output from training a Model is compressed using the Archive class, which results in a zip folder. A class diagram of the Model API can be found below:

AcousticModel(source[, root_directory])
G2PModel(source[, root_directory])
IvectorExtractor(source[, root_directory]) Archive for i-vector extractors

Feature processing API

mfcc(mfcc_directory, num_jobs, feature_config) Multiprocessing function that converts wav files into MFCCs
apply_cmvn
add_deltas
apply_lda

Multiprocessing API

The multiprocessing module contains most of the interactions with Kaldi, as multiple processes are used to speed up the set up and aligning of the dataset.

compile_train_graphs(directory, …[, debug]) Multiprocessing function that compiles training graphs for utterances
mono_align_equal(mono_directory, …) Multiprocessing function that creates equal alignments for base monophone training
align(iteration, directory, split_directory, …) Multiprocessing function that aligns based on the current model
acc_stats(iteration, directory, …) Multiprocessing function that computes stats for GMM training
tree_stats(directory, align_directory, …) Multiprocessing function that computes stats for decision tree training
calc_fmllr(directory, split_directory, …) Multiprocessing function that computes speaker adaptation (fMLLR)
convert_alignments(directory, …) Multiprocessing function that converts alignments from previous training
convert_ali_to_textgrids(align_config, …) Multiprocessing function that aligns based on the current model

For use with ivector extractors

lda_acc_stats(directory, split_directory, …) Multiprocessing function that accumulates LDA statistics
calc_lda_mllt(directory, data_directory, …) Multiprocessing function that calculates LDA+MLLT transformations
gmm_gselect(iteration, config, num_jobs) Multiprocessing function that stores Gaussian selection indices on disk
acc_global_stats(config, num_jobs, iteration) Multiprocessing function that accumulates global GMM stats
gauss_to_post(config, num_jobs) Multiprocessing function that does Gaussian selection and posterior extraction
acc_ivector_stats(config, num_jobs, iteration) Multiprocessing function that calculates i-vector extractor stats
extract_ivectors(directory, split_directory, …) Multiprocessing function that extracts i-vectors.

Trainer API

These Trainer classes contain information about configuring data preparation and training. A class diagram of the Configuration API can be found below:

MonophoneTrainer(default_feature_config) Configuration class for monophone training
TriphoneTrainer(default_feature_config) Configuration class for triphone training
LdaTrainer(default_feature_config) Configuration class for LDA+MLLT training
SatTrainer(default_feature_config) Configuration class for speaker adapted training (SAT)
IvectorExtractorTrainer(default_feature_config) Configuration class for i-vector extractor training