Timeshrinking

From SpeechWiki

(Difference between revisions)

Revision as of 23:54, 11 October 2009

Fisher experiments

number of frames dropped on Fisher corpus
<math>\tau</math>	frames dropped
1	0%
.9	~5%
.6	~35%

I have to check for bugs. It could be that the threshold is too low or it could be something else too. We should probably rerun baseline too, just to make sure I didn't optimize it unfairly.

Things to try

Test svitchboard with fisher-trained model to see if we still get good results
Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)

LM penalty and scale

Since we now have 62 PLP+MLP features instead of 39 PLP, we should probably change LM scale by a factor 62/39=1.58. The original (not carefully tuned)PLP LM scale was 10. Perhaps it would make sense to multiply the LM penalty (-1 for PLP) by the same 1.58 factor.

final test

20k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.

lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.

Timeshrinking results on fisher
train <math>\tau</math>	test <math>\tau</math>	dev 2000 utt WER	dev 2000 utt on triphone single-gaussian model WER	comments
1	1	51.6%		old baseline
1	1	53.7%	78.2%	baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
1	1	54.5%		baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned. Should be worse, and it is.
.6	.6	69.3%
.9	.9	56.3%	80.4%
1	.9	53.9%
.9	.9	57.2	80.7%	using the non-timeshrinking str file for test
.9	1	54.6
1	1	55.4%	72.8%	PLP+MLP tandem
1	1	50.6%		PLP+MLP tandem, LM scale 16
1	1	49.7%		PLP+MLP tandem, LM scale 20
1	1	50.0%		PLP+MLP tandem, LM scale 25
.9	.9	49.9%		PLP+MLP tandem lm_scale 16
.9	.9	49.3%		PLP+MLP tandem lm_scale 20

Future Directions

Can be viewed as a two-mode special case of best-first viterbi search. So make a real best-first lattice search. Mark mentioned some attempts in the 80'ies to do this.

Final test Timeshrinking results on fisher
<math>\tau</math>	test 20k utt WER	comments
1	TE	PLP+MLP tandem, LM scale 20 (tuned)
.9	TE	everything except <math>\tau</math> is same as baseline.

@@ Line 19: / Line 19: @@
 |-
 | 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1/LATEST.log 53.7%] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.1.onSingleGaussian/LATEST.log 78.2%] || baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1/LATEST.log 54.5%] || || baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned.  Should be worse, and it is.
 |-
 | .6 || .6 || 69.3%
@@ Line 34: / Line 36: @@
 | 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmAdj/LATEST.log 50.6%] ||  || PLP+MLP tandem, LM scale 16
 |-
-| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/.../LATEST.log TR] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9.mlp.onSingleGaussian/LATEST.log 73.1% ] || PLP+MLP tandem
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc20/LATEST.log 49.7%] ||  || PLP+MLP tandem, LM scale 20
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc25/LATEST.log 50.0%] ||  || PLP+MLP tandem, LM scale 25
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9.mlp.lmSc16/LATEST.log 49.9% ] || || PLP+MLP tandem lm_scale 16
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9.mlp.lmSc20/LATEST.log 49.3% ] || || PLP+MLP tandem lm_scale 20
+|-
 |-|}
@@ Line 42: / Line 52: @@
 * Test svitchboard with fisher-trained model to see if we still get good results
-* Train and test on plp+mlp, like svitchboard timeshrinking was done.
+* Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
 * Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)
@@ Line 50: / Line 60: @@
 ==final test==
 k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.
+lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.
+{| class="wikitable"
+|+ Final test Timeshrinking results on fisher
+! <math>\tau</math> !! test 20k  utt WER !! comments
+|-
+| 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc20.final/LATEST.log TE] || PLP+MLP tandem, LM scale 20 (tuned)
+|-
+| .9 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.point9.mlp.lmSc20.final/LATEST.log TE]  || everything except <math>\tau</math> is same as baseline.
+|-|}
 ==Future Directions==
 * Can be viewed as a two-mode special case of best-first viterbi search.  So make a real best-first lattice search.  Mark mentioned some attempts in the 80'ies to do this.
 [[Category:Fisher Experiments]]

Timeshrinking

From SpeechWiki

Revision as of 23:54, 11 October 2009

Contents

Fisher experiments

Things to try

LM penalty and scale

final test

Future Directions

Views

Personal tools

Navigation

Toolbox

Search