Timeshrinking

From SpeechWiki

Jump to: navigation, search

Contents

Fisher experiments

number of frames dropped on Fisher corpus
<math>\tau</math> frames dropped
1 0%
.9 ~5%
.6 ~35%
Timeshrinking results on fisher
train <math>\tau</math> test <math>\tau</math> dev 2000 utt WER dev 2000 utt on triphone single-gaussian model WER comments
1 1 51.6% old baseline
1 1 53.7% 78.2% baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
1 1 54.5% baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned. Should be worse, and it is.
.6 .6 69.3%
.9 .9 56.3% 80.4%
1 .9 53.9%
.9 .9 57.2 80.7% using the non-timeshrinking str file for test
.9 1 54.6
1 1 55.4% 72.8% PLP+MLP tandem
1 1 50.6% PLP+MLP tandem, LM scale 16
1 1 49.7% PLP+MLP tandem, LM scale 20
1 1 50.0% PLP+MLP tandem, LM scale 25
.9 .9 49.9% PLP+MLP tandem lm_scale 16
.9 .9 49.3% PLP+MLP tandem lm_scale 20

I have to check for bugs. It could be that the threshold is too low or it could be something else too. We should probably rerun baseline too, just to make sure I didn't optimize it unfairly.

Things to try

  • Test svitchboard with fisher-trained model to see if we still get good results
  • Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
  • Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)

LM penalty and scale

Since we now have 62 PLP+MLP features instead of 39 PLP, we should probably change LM scale by a factor 62/39=1.58. The original (not carefully tuned)PLP LM scale was 10. Perhaps it would make sense to multiply the LM penalty (-1 for PLP) by the same 1.58 factor.

final test

20k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.

lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.

Final test Timeshrinking results on fisher
<math>\tau</math> test 20k utt WER comments
1 TE PLP+MLP tandem, LM scale 20 (tuned)
.9 TE everything except <math>\tau</math> is same as baseline.

Future Directions

  • Can be viewed as a two-mode special case of best-first viterbi search. So make a real best-first lattice search. Mark mentioned some attempts in the 80'ies to do this.

rexpanding

Helps a lot - strictly better than timeshrinking, doubles wer improvement.

looking into it:

  • forceAlign to subphones using timeshrunk .9 training, and reExpanded.9 dev set, to find the closest gold standard
  • calc likelihood given the state for each frame using original and reExpanded observations

model comparison

variance wighted by component weight, summed across all components, mixtures and features:

1mlp : 2.7010e+04
.9mlp : 2.6632e+04

So on average looks like .9mlp has slightly tighter distributions

iterative timeshrinking

iteration 0: is timeshrinking with MLP and results are above and in paper.

iteration 1:

Experiments progress chart:

TABLE CAPTION
Experiment Generate TS Train Data Generate TS Test Data Force align Fully train test Train just the last iteration test test dropping low p(o) frames
tau=.9 R x x x x x x x
tau=.85 x x x x x x x x
tau=.9 dropping ignoring low p(o) frames x x x x x x x x
Personal tools