API Reference

Below is a diagram of the main classes used in MFA:

Aligner API

There are two main Aligner classes, one for using a pretrained model and one for training a model while aligning. A class diagram of the Aligner API can be found below:

PretrainedAligner(corpus, dictionary, …[, …]) Class for aligning a dataset using a pretrained acoustic model
TrainableAligner(corpus, dictionary, …[, …]) Aligner that aligns and trains acoustics models on a large dataset

Corpus API

The Corpus class contains information about how a dataset is structured. A class diagram of the Corpus API can be found below:

AlignableCorpus(directory, output_directory) Class that stores information about the dataset to align.

Dictionary API

The Dictionary class contains pronunciation and orthographic information. A class diagram of the Dictionary API can be found below:

Dictionary(input_path, output_directory[, …]) Class containing information about a pronunciation dictionary

Model API

Output from training a Model is compressed using the Archive class, which results in a zip folder. A class diagram of the Model API can be found below:

AcousticModel(source[, root_directory])
G2PModel(source[, root_directory])
IvectorExtractor(source[, root_directory]) Archive for i-vector extractors

Feature processing API

mfcc(mfcc_directory, num_jobs, feature_config) Multiprocessing function that converts wav files into MFCCs

Multiprocessing API

The multiprocessing module contains most of the interactions with Kaldi, as multiple processes are used to speed up the set up and aligning of the dataset.

compile_train_graphs(directory, …[, debug]) Multiprocessing function that compiles training graphs for utterances
mono_align_equal(mono_directory, …) Multiprocessing function that creates equal alignments for base monophone training
align(iteration, directory, split_directory, …) Multiprocessing function that aligns based on the current model
acc_stats(iteration, directory, …) Multiprocessing function that computes stats for GMM training
tree_stats(directory, align_directory, …) Multiprocessing function that computes stats for decision tree training
calc_fmllr(directory, split_directory, …) Multiprocessing function that computes speaker adaptation (fMLLR)
convert_alignments(directory, …) Multiprocessing function that converts alignments from previous training
convert_ali_to_textgrids(align_config, …) Multiprocessing function that aligns based on the current model

For use with ivector extractors

lda_acc_stats(directory, split_directory, …) Multiprocessing function that accumulates LDA statistics
calc_lda_mllt(directory, data_directory, …) Multiprocessing function that calculates LDA+MLLT transformations
gmm_gselect(iteration, config, num_jobs) Multiprocessing function that stores Gaussian selection indices on disk
acc_global_stats(config, num_jobs, iteration) Multiprocessing function that accumulates global GMM stats
gauss_to_post(config, num_jobs) Multiprocessing function that does Gaussian selection and posterior extraction
acc_ivector_stats(config, num_jobs, iteration) Multiprocessing function that calculates i-vector extractor stats
extract_ivectors(directory, split_directory, …) Multiprocessing function that extracts i-vectors.

Trainer API

These Trainer classes contain information about configuring data preparation and training. A class diagram of the Configuration API can be found below:

MonophoneTrainer(default_feature_config) Configuration class for monophone training
TriphoneTrainer(default_feature_config) Configuration class for triphone training
LdaTrainer(default_feature_config) Configuration class for LDA+MLLT training
SatTrainer(default_feature_config) Configuration class for speaker adapted training (SAT)
IvectorExtractorTrainer(default_feature_config) Configuration class for i-vector extractor training