Ivector extractor training options#

Warning

The current implementation of ivectors is a little spotty and there is a planned pass over the speaker diarization on the roadmap for 2.1.

Diagonal UBM training#

For the Kaldi recipe that DUBM training is based on, see sid/train_diag_ubm.sh.

Parameter	Default value	Notes
num_iterations	4	Number of iterations for training UBM
num_gselect	30	Number of Gaussian-selection indices to use while training
subsample	5	Subsample factor for feature frames
num_frames	500000	Number of frames to keep in memory for initialization
num_gaussians	256	Number of gaussians to use for DUBM training
num_iterations_init	20	Number of iteration to use when initializing UBM
initial_gaussian_proportion	0.5	Start with half the target number of Gaussians
min_gaussian_weight	0.0001
remove_low_count_gaussians	True	Flag for removing low count gaussians in the final round of training

Ivector training#

For the Kaldi recipe that ivector training is based on, see sid/train_ivector_extractor.sh.

Parameter	Default value	Notes
ivector_dimension	128	Dimension of extracted ivectors
num_iterations	10	Number of training iterations
num_gselect	20	Gaussian-selection using diagonal model: number of Gaussians to select
posterior_scale	1.0	Scale on posteriors to correct for inter-frame correlation
silence_weight	0.0	Weight of silence in calculating posteriors for ivector extraction
min_post	0.025	Minimum posterior to use (posteriors below this are pruned out)
gaussian_min_count	100
subsample	5	Speeds up training (samples every Xth frame)
max_count	100	The use of this option can make iVectors more consistent for different lengths of utterance, by scaling up the prior term when the data-count exceeds this value. The data-count is after posterior-scaling, so assuming the posterior-scale is 0.1, max_count=100 starts having effect after 1000 frames, or 10 seconds of data.
uses_cmvn	True	Flag for whether to apply CMVN to input features

Default training config file#

The below configuration file shows the equivalent of the current 2.0 training regime, mostly as an example of what configuration options are available and how they progress through the overall training.

features:
  type: "mfcc"
  use_energy: true
  frame_shift: 10

training:
  - dubm:
      num_iterations: 4
      num_gselect: 30
      num_gaussians: 256
      num_iterations_init: 20
  - ivector:
      ivector_dimension: 128
      num_iterations: 10
      gaussian_min_count: 100
      silence_weight: 0.0
      posterior_scale: 0.1
      max_count: 100