3.0 Changelog#

3.0.7#

  • Add check for current version vs latest version on run

  • Added --final_clean flag to clean temporary files at the end of each run, along with a --always_final_clean flag for mfa configure

  • Removed dependencies on sox and ffmpeg as audio loading is done through librosa in kalpy

  • Removed poorly aligned files in subset from further training

  • Fixed an issue where specified words for cutoff modeling

3.0.6#

  • Fixed an issue where alignment analysis would not produce data for speech log likelihood and phone duration deviation

  • Changed phone duration deviation metric to be maximum duration deviation rather than average across all phones in the utterance

  • Fixed a crash when an empty phone set was specified in phone groups configuration files

  • Fixed a crash when when using the --language flag with values other than japanese, thai, chinese or korean

3.0.5#

  • Added mfa_update command to better sync changes across dependencies

  • Updated how calculated properties are loaded to fix crashes in Anchor

  • Change when alignments are analyzed in training

3.0.4#

  • Fixed issue with github token set in the environment not being respected

  • Changed ordering of g2p output from corpora to be based on word frequency rather than alphabetical

  • Changed duration deviation to save the max z-scored duration, rather than be the average over all phones

  • Update default punctuation markers to cover Arabic script punctuation

3.0.3#

  • Fixed regression for not merging clitic words when textgrid cleanup is disabled

  • Fixed issue with copying files when symlinks are not possible on windows

  • Fixed an issue with using G2P models during training/alignment

  • Changed default feature config to set use_energy=True and dithering=0.0001

  • Updated tokenization when lower casing to remove extra dot for capital i in Turkish

  • Fix an issue where special disambiguation symbols were not always in the phone table

3.0.2#

  • Added support for --phone_groups_path and --rules_path to Validating data

  • Added support for speechbrain 1.0 release

  • Allow alignment with older models that don’t have a dedicated speaker-independent .alimdl model

  • Fixed a bug in loading lexicon compilers

  • Updated default feature configuration to remove dithering and use energy_floor=1.0, following torchaudio’s implementation

3.0.1#

  • Fixed an issue where pool size would be too low for number of jobs

  • Fixed an issue with specifying --phone_groups_path causing a crash

3.0.0#

  • Fixed a regression where --dither was not being passed correctly

  • Fixed a bug on Windows when symlink permissions were not present

3.0.0rc2#

  • Add support for per-dictionary g2p models during acoustic model training and alignment

  • Change Chinese language support to require dragonmapper

  • Fixed bug in TextGrid generation for incorrect number of intervals

3.0.0rc1#

  • Fixed a bug related to fMLLR computation in kalpy that was causing a degradation in aligner performance

  • Improved memory usage for large corpora when generating MFCCs

  • Improved subset logic in acoustic model training to ensure all speakers in the subset have at least 5 utterances for better training

  • Fixed a bug in triphone training initialization that was causing a degradation in aligner performance

  • Reimplemented multiprocessing in addition to threading from 3.0.0a1

  • Made logging more verbose for acoustic model training

  • Improved subset logic for G2P training and validation splits to ensure low-frequency graphemes and phones are reliably in the training data

  • Added better validation for phone groups files in acoustic model training

  • Added better validation for phone mapping files in alignment evaluation

  • Add tokenization support for Chinese languages when spacy-pkuseg and hanziconv are installed via pip install spacy-pkuseg hanziconv dragonmapper

  • Add tokenization support for Korean when python-mecab-ko and jamo are installed via pip install python-mecab-ko jamo

  • Add tokenization support for Thai when pythainlp is installed via pip install pythainlp

  • Fixed a bug where pronunciations below the OOV count threshold were being exported at the end of acoustic model training

  • Fixed a feature generation error when using MFCC+pitch features

  • Changed debug output for evaluation mode in G2P model training to only output incorrect entries

  • Added --model_version parameter for all model training commands to override using MFA’s version

  • Optimized TextGrid exporting

3.0.0a8#

  • Fixed an issue in not normalizing utterance and speaker xvectors from speechbrain

  • Bug fixes for integration with Anchor

3.0.0a7#

  • Fixed an issue where using relative paths could delete the all MFA temporary files with --clean

  • Fixed an issue where “<eps>” in transcript to force silence was inserting phones for OOVs rather than silence

3.0.0a6#

  • Added support for generating pronunciations during training and alignment via --g2p_model_path

  • Added support for Japanese tokenization through sudachipy

  • Fixed a crash in fine tuning

  • Added functionality for allowing a directory to be passed as the output path for Align a single file (mfa align_one)

3.0.0a5#

  • Updated for Kalpy version 0.5.5

  • Updated --single_speaker mode to not perform speaker adaptation

  • Added documentation for Speaker adaptation

3.0.0a4#

3.0.0a3#

  • Refactored tokenization for future spacy use

3.0.0a2#

  • Revamped how configuration is done following change to using threading instead of multiprocessing

3.0.0a1#