API Reference

Below is a diagram of the main classes used in MFA:

Aligner API

There are two main Aligner classes, one for using a pretrained model and one for training a model while aligning. A class diagram of the Aligner API can be found below:

PretrainedAligner(corpus, dictionary, …[, …]) Class for aligning a dataset using a pretrained acoustic model
TrainableAligner(corpus, dictionary, …[, …]) Aligner that aligns and trains acoustics models on a large dataset

Corpus API

The Corpus class contains information about how a dataset is structured. A class diagram of the Corpus API can be found below:

Corpus(directory, output_directory[, …]) Class that stores information about the dataset to align.

Dictionary API

The Dictionary class contains pronunciation and orthographic information. A class diagram of the Dictionary API can be found below:

Dictionary(input_path, output_directory[, …]) Class containing information about a pronunciation dictionary

Model API

Output from training a Model is compressed using the Archive class, which results in a zip folder. A class diagram of the Model API can be found below:

AcousticModel(source[, is_tmpdir])
G2PModel(source[, is_tmpdir])
IvectorExtractor(source[, is_tmpdir]) Archive for i-vector extractors (used with DNNs)

Feature processing API

mfcc(mfcc_directory, num_jobs, …) Multiprocessing function that converts wav files into MFCCs
apply_cmvn(directory, num_jobs, config)
add_deltas(directory, num_jobs, config)
apply_lda(directory, num_jobs, config)

Multiprocessing API

The multiprocessing module contains most of the interactions with Kaldi, as multiple processes are used to speed up the set up and aligning of the dataset.

compile_train_graphs(directory, …[, debug]) Multiprocessing function that compiles training graphs for utterances
mono_align_equal(mono_directory, …) Multiprocessing function that creates equal alignments for base monophone training
align(iteration, directory, split_directory, …) Multiprocessing function that aligns based on the current model
acc_stats(iteration, directory, …) Multiprocessing function that computes stats for GMM training
tree_stats(directory, align_directory, …) Multiprocessing function that computes stats for decision tree training
calc_fmllr(directory, split_directory, …) Multiprocessing function that computes speaker adaptation (fMLLR)
convert_alignments(directory, …) Multiprocessing function that converts alignments from previous training
convert_ali_to_textgrids(align_config, …) Multiprocessing function that aligns based on the current model

For use with DNNs

lda_acc_stats(directory, split_dir, …) Multiprocessing function that accumulates LDA statistics
calc_lda_mllt(directory, split_directory, …) Multiprocessing function that calculates LDA+MLLT transformations
gmm_gselect(config, num_jobs) Multiprocessing function that stores Gaussian selection indices on disk
acc_global_stats(config, num_jobs, iteration) Multiprocessing function that accumulates global GMM stats
gauss_to_post(config, num_jobs) Multiprocessing function that does Gaussian selection and posterior extraction
acc_ivector_stats(config, num_jobs, iteration) Multiprocessing function that calculates i-vector extractor stats
extract_ivectors(config, num_jobs) Multiprocessing function that extracts i-vectors.
get_egs(config, ali_dir, valid_uttlist, …) Multiprocessing function that gets training examples for the neural net
get_lda_nnet(config, align_directory, num_jobs) Multiprocessing function that extracts training examples and does LDA transformation
nnet_train_trans(nnet_dir, align_dir, …) Multiprocessing function that trains transition prbabilities and sets priors.
nnet_train(nnet_dir, egs_dir, mdl, i, num_jobs) Multiprocessing function that trains the neural net.
nnet_align(i, config, train_directory, …) Multiprocessing function that generates an nnet alignment
compute_prob(i, nnet_dir, egs_dir, …) Multiprocessing function that computes the current log probability of the iteration
get_average_posteriors(i, nnet_dir, …) Multiprocessing function that gets average posterior for purposes of computing priors (for nnet)
relabel_egs(i, nnet_dir, egs_in, alignments, …) Multiprocessing function that relabels training examples

Trainer API

These Trainer classes contain information about configuring data preparation and training. A class diagram of the Configuration API can be found below:

MonophoneTrainer(default_feature_config) Configuration class for monophone training
TriphoneTrainer(default_feature_config) Configuration class for triphone training
LdaTrainer(default_feature_config) Configuration class for LDA+MLLT training
SatTrainer(default_feature_config) Configuration class for speaker adapted training (SAT)
IvectorExtractorTrainer(default_feature_config) Configuration class for i-vector extractor training
NnetTrainer(default_feature_config) Configuration class for neural network training