AlignableCorpus

class montreal_forced_aligner.corpus.AlignableCorpus(directory, output_directory, speaker_characters=0, num_jobs=3, sample_rate=16000, debug=False, logger=None, use_mp=True, punctuation=None, clitic_markers=None, parse_text_only_files=False, audio_directory=None)[source]

Class that stores information about the dataset to align.

Corpus objects have a number of mappings from either utterances or speakers to various properties, and mappings between utterances and speakers.

See http://kaldi-asr.org/doc/data_prep.html for more information about the files that are created by this class.

Parameters:
directory : str

Directory of the dataset to align

output_directory : str

Directory to store generated data for the Kaldi binaries

speaker_characters : int, optional

Number of characters in the filenames to count as the speaker ID, if not specified, speaker IDs are generated from directory names

num_jobs : int, optional

Number of processes to use, defaults to 3

Raises:
CorpusError

Raised if the specified corpus directory does not exist

SampleRateError

Raised if the wav files in the dataset do not share a consistent sample rate

Attributes

features_directory
features_log_directory
grouped_cmvn
grouped_feat
grouped_segments
grouped_spk2utt
grouped_utt2spk
grouped_wav
num_utterances
speakers
utterances
word_set

Methods

add_utterance(utterance, speaker, file, text)
check_warnings()
cmvn_by_dictionary(multispeaker_dictionary, …)
combine_feats()
create_subset(subset, feature_config)
delete_utterance(utterance)
feats_by_dictionary(multispeaker_dictionary, …)
figure_utterance_lengths()
find_best_groupings()
get_feat_dim(feature_config)
get_wav_duration(utt)
get_word_frequency(dictionary)
grouped_text([dictionary])
grouped_text_int(dictionary)
grouped_utt2fst(dictionary[, num_frequent_words])
initialize_corpus([dictionary, feature_config])
normalized_text_iter([dictionary, min_count])
parse_features_logs()
save_text_file(file_name)
speaker_utterance_info()
spk2utt_by_dictionary(…)
split(dictionary)
split_by_dictionary(multispeaker_dictionary)
split_directory()
subset_directory(subset, feature_config)
text_int_by_dictionay(…)
utt2spk_by_dictionary(…)
write()