API Reference¶
Below is a diagram of the main classes used in MFA:
Aligner API¶
There are two main Aligner classes, one for using a pretrained model and one for training a model while aligning. A class diagram of the Aligner API can be found below:
PretrainedAligner (corpus, dictionary, …[, …]) |
Class for aligning a dataset using a pretrained acoustic model |
TrainableAligner (corpus, dictionary, …[, …]) |
Aligner that aligns and trains acoustics models on a large dataset |
Corpus API¶
The Corpus class contains information about how a dataset is structured. A class diagram of the Corpus API can be found below:
AlignableCorpus (directory, output_directory) |
Class that stores information about the dataset to align. |
Dictionary API¶
The Dictionary class contains pronunciation and orthographic information. A class diagram of the Dictionary API can be found below:
Dictionary (input_path, output_directory[, …]) |
Class containing information about a pronunciation dictionary |
Model API¶
Output from training a Model is compressed using the Archive class, which results in a zip folder. A class diagram of the Model API can be found below:
AcousticModel (source[, root_directory]) |
|
G2PModel (source[, root_directory]) |
|
IvectorExtractor (source[, root_directory]) |
Archive for i-vector extractors |
Feature processing API¶
mfcc (mfcc_directory, num_jobs, …) |
Multiprocessing function that converts wav files into MFCCs |
apply_cmvn (directory, num_jobs, config) |
|
add_deltas (directory, num_jobs, config) |
|
apply_lda (directory, num_jobs, config) |
Multiprocessing API¶
The multiprocessing module contains most of the interactions with Kaldi, as multiple processes are used to speed up the set up and aligning of the dataset.
compile_train_graphs (directory, …[, debug]) |
Multiprocessing function that compiles training graphs for utterances |
mono_align_equal (mono_directory, …) |
Multiprocessing function that creates equal alignments for base monophone training |
align (iteration, directory, split_directory, …) |
Multiprocessing function that aligns based on the current model |
acc_stats (iteration, directory, …) |
Multiprocessing function that computes stats for GMM training |
tree_stats (directory, align_directory, …) |
Multiprocessing function that computes stats for decision tree training |
calc_fmllr (directory, split_directory, …) |
Multiprocessing function that computes speaker adaptation (fMLLR) |
convert_alignments (directory, …) |
Multiprocessing function that converts alignments from previous training |
convert_ali_to_textgrids (align_config, …) |
Multiprocessing function that aligns based on the current model |
For use with ivector extractors¶
lda_acc_stats (directory, split_directory, …) |
Multiprocessing function that accumulates LDA statistics |
calc_lda_mllt (directory, split_directory, …) |
Multiprocessing function that calculates LDA+MLLT transformations |
gmm_gselect (config, num_jobs) |
Multiprocessing function that stores Gaussian selection indices on disk |
acc_global_stats (config, num_jobs, iteration) |
Multiprocessing function that accumulates global GMM stats |
gauss_to_post (config, num_jobs) |
Multiprocessing function that does Gaussian selection and posterior extraction |
acc_ivector_stats (config, num_jobs, iteration) |
Multiprocessing function that calculates i-vector extractor stats |
extract_ivectors (config, num_jobs) |
Multiprocessing function that extracts i-vectors. |
Trainer API¶
These Trainer classes contain information about configuring data preparation and training. A class diagram of the Configuration API can be found below:
MonophoneTrainer (default_feature_config) |
Configuration class for monophone training |
TriphoneTrainer (default_feature_config) |
Configuration class for triphone training |
LdaTrainer (default_feature_config) |
Configuration class for LDA+MLLT training |
SatTrainer (default_feature_config) |
Configuration class for speaker adapted training (SAT) |
IvectorExtractorTrainer (default_feature_config) |
Configuration class for i-vector extractor training |