The validation utility will perform the basic set up that alignment would perform, but analyzes and reports any issues that the user may want to fix.
First, the utility parses the corpus and dictionary, prints out summary information about the corpus, and logs any of the following issues:
- If there are any words in transcriptions that are not in the dictionary, these are logged as out-of-vocabulary items (OOVs). A list of these OOVs and which utterances they appear in are saved to text files.
- Any issues reading sound files
- Any issues generating features, skipped if
- Any transcription files missing .wav files
- Any .wav files missing transcription files
- Any issues reading transcription files
- Any unsupported sampling rates of .wav files
- Any unaligned files from a basic monophone acoustic model trained on the dataset (or using a supplied acoustic model),
- Any files that have deviations from their original transcription to decoded transcriptions using a simple language model
Running the validation utility¶
Steps to run the validation utility:
- Provided the steps in Installation have been completed and you are in the same Conda/virtual environment that MFA was installed in.
- Run the following command, substituting the arguments with your own paths:
mfa validate corpus_directory dictionary_path [optional_acoustic_model_path]
corpus_directory argument should be a full path to the corpus to validate, following the proper Data formats.
dictionary_path should be a full path to the pronunciation dictionary you want to use with
the corpus, following the proper Dictionary format. The optional
acoustic_model_path can be used
to test alignment as well as flag potential transcription issues if
--test_transcriptions is present.
acoustic_model_path should be either a full path to an acoustic model you’ve trained, or you can use one of the
Pretrained acoustic models.
Extra options to the validation utility:
Number of characters to use to identify speakers; if not specified, the aligner assumes that the directory name is the identifier for the speaker. Additionally, it accepts the value
prosodylabto use the second field of a
_delimited file name, following the convention of labelling production data in the ProsodyLab at McGill.
Temporary directory root to use for aligning, default is
Number of jobs to use; defaults to 3, set higher if you have more processors available and would like to align faster
Prevent validation of feature generation and initial alignment. Using this flag will make validation much faster.
If flagged, the validation utility will construct simple unigram language model and attempt to decode each segment to be aligned. Segments are flagged if the decoded transcriptions contain deviations from the original transcriptions. This is largely experimental feature that may be useful, but may not be always reliable. Cannot be flagged at the same time as