Ivector extractor training options#
Warning
The current implementation of ivectors is a little spotty and there is a planned pass over the speaker diarization on the roadmap for 2.1.
Diagonal UBM training#
For the Kaldi recipe that DUBM training is based on, see sid/train_diag_ubm.sh.
Parameter |
Default value |
Notes |
---|---|---|
num_iterations |
4 |
Number of iterations for training UBM |
num_gselect |
30 |
Number of Gaussian-selection indices to use while training |
subsample |
5 |
Subsample factor for feature frames |
num_frames |
500000 |
Number of frames to keep in memory for initialization |
num_gaussians |
256 |
Number of gaussians to use for DUBM training |
num_iterations_init |
20 |
Number of iteration to use when initializing UBM |
initial_gaussian_proportion |
0.5 |
Start with half the target number of Gaussians |
min_gaussian_weight |
0.0001 |
|
remove_low_count_gaussians |
True |
Flag for removing low count gaussians in the final round of training |
Ivector training#
For the Kaldi recipe that ivector training is based on, see sid/train_ivector_extractor.sh.
Parameter |
Default value |
Notes |
---|---|---|
ivector_dimension |
128 |
Dimension of extracted ivectors |
num_iterations |
10 |
Number of training iterations |
num_gselect |
20 |
Gaussian-selection using diagonal model: number of Gaussians to select |
posterior_scale |
1.0 |
Scale on posteriors to correct for inter-frame correlation |
silence_weight |
0.0 |
Weight of silence in calculating posteriors for ivector extraction |
min_post |
0.025 |
Minimum posterior to use (posteriors below this are pruned out) |
gaussian_min_count |
100 |
|
subsample |
5 |
Speeds up training (samples every Xth frame) |
max_count |
100 |
The use of this option can make iVectors more consistent for different lengths of utterance, by scaling up the prior term when the data-count exceeds this value. The data-count is after posterior-scaling, so assuming the posterior-scale is 0.1, max_count=100 starts having effect after 1000 frames, or 10 seconds of data. |
uses_cmvn |
True |
Flag for whether to apply CMVN to input features |
Default training config file#
The below configuration file shows the equivalent of the current 2.0 training regime, mostly as an example of what configuration options are available and how they progress through the overall training.
features:
type: "mfcc"
use_energy: true
frame_shift: 10
training:
- dubm:
num_iterations: 4
num_gselect: 30
num_gaussians: 256
num_iterations_init: 20
- ivector:
ivector_dimension: 128
num_iterations: 10
gaussian_min_count: 100
silence_weight: 0.0
posterior_scale: 0.1
max_count: 100