Timeshrinking

From SpeechWiki

Fisher experiments

number of frames dropped on Fisher corpus
<math>\tau</math>	frames dropped
1	0%
.9	~5%
.6	~35%

Timeshrinking results on fisher
train <math>\tau</math>	test <math>\tau</math>	dev 2000 utt WER	dev 2000 utt on triphone single-gaussian model WER	comments
1	1	51.6%		old baseline
1	1	53.7%	78.2%	baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
1	1	54.5%		baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned. Should be worse, and it is.
.6	.6	69.3%
.9	.9	56.3%	80.4%
1	.9	53.9%
.9	.9	57.2	80.7%	using the non-timeshrinking str file for test
.9	1	54.6
1	1	55.4%	72.8%	PLP+MLP tandem
1	1	50.6%		PLP+MLP tandem, LM scale 16
1	1	49.7%		PLP+MLP tandem, LM scale 20
1	1	50.0%		PLP+MLP tandem, LM scale 25
.9	.9	49.9%		PLP+MLP tandem lm_scale 16
.9	.9	49.3%		PLP+MLP tandem lm_scale 20

I have to check for bugs. It could be that the threshold is too low or it could be something else too. We should probably rerun baseline too, just to make sure I didn't optimize it unfairly.

Things to try

Test svitchboard with fisher-trained model to see if we still get good results
Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)

LM penalty and scale

Since we now have 62 PLP+MLP features instead of 39 PLP, we should probably change LM scale by a factor 62/39=1.58. The original (not carefully tuned)PLP LM scale was 10. Perhaps it would make sense to multiply the LM penalty (-1 for PLP) by the same 1.58 factor.

final test

20k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.

lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.

Final test Timeshrinking results on fisher
<math>\tau</math>	test 20k utt WER	comments
1	TE	PLP+MLP tandem, LM scale 20 (tuned)
.9	TE	everything except <math>\tau</math> is same as baseline.

Future Directions

Can be viewed as a two-mode special case of best-first viterbi search. So make a real best-first lattice search. Mark mentioned some attempts in the 80'ies to do this.

rexpanding

Helps a lot - strictly better than timeshrinking, doubles wer improvement.

looking into it:

forceAlign to subphones using timeshrunk .9 training, and reExpanded.9 dev set, to find the closest gold standard
calc likelihood given the state for each frame using original and reExpanded observations

model comparison

variance wighted by component weight, summed across all components, mixtures and features:

1mlp : 2.7010e+04
.9mlp : 2.6632e+04

So on average looks like .9mlp has slightly tighter distributions

iterative timeshrinking

iteration 0: is timeshrinking with MLP and results are above and in paper.

iteration 1:

Experiments progress chart:

TABLE CAPTION
Experiment	Generate TS Train Data	Generate TS Test Data	Force align	Fully train	test	Train just the last iteration	test	test dropping low p(o) frames
tau=.9	R	x	x	x	x	x	x	x
tau=.85	x	x	x	x	x	x	x	x
tau=.9 dropping ignoring low p(o) frames	x	x	x	x	x	x	x	x

Timeshrinking

From SpeechWiki

Contents

Fisher experiments

Things to try

LM penalty and scale

final test

Future Directions

rexpanding

model comparison

iterative timeshrinking

Views

Personal tools

Navigation

Toolbox

Search