Timeshrinking

From SpeechWiki

(Difference between revisions)

Latest revision as of 06:27, 14 April 2010

Fisher experiments

number of frames dropped on Fisher corpus
<math>\tau</math>	frames dropped
1	0%
.9	~5%
.6	~35%

Timeshrinking results on fisher
train <math>\tau</math>	test <math>\tau</math>	dev 2000 utt WER	dev 2000 utt on triphone single-gaussian model WER	comments
1	1	51.6%		old baseline
1	1	53.7%	78.2%	baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
1	1	54.5%		baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned. Should be worse, and it is.
.6	.6	69.3%
.9	.9	56.3%	80.4%
1	.9	53.9%
.9	.9	57.2	80.7%	using the non-timeshrinking str file for test
.9	1	54.6
1	1	55.4%	72.8%	PLP+MLP tandem
1	1	50.6%		PLP+MLP tandem, LM scale 16
1	1	49.7%		PLP+MLP tandem, LM scale 20
1	1	50.0%		PLP+MLP tandem, LM scale 25
.9	.9	49.9%		PLP+MLP tandem lm_scale 16
.9	.9	49.3%		PLP+MLP tandem lm_scale 20

I have to check for bugs. It could be that the threshold is too low or it could be something else too. We should probably rerun baseline too, just to make sure I didn't optimize it unfairly.

Things to try

Test svitchboard with fisher-trained model to see if we still get good results
Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)

LM penalty and scale

Since we now have 62 PLP+MLP features instead of 39 PLP, we should probably change LM scale by a factor 62/39=1.58. The original (not carefully tuned)PLP LM scale was 10. Perhaps it would make sense to multiply the LM penalty (-1 for PLP) by the same 1.58 factor.

final test

20k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.

lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.

Final test Timeshrinking results on fisher
<math>\tau</math>	test 20k utt WER	comments
1	TE	PLP+MLP tandem, LM scale 20 (tuned)
.9	TE	everything except <math>\tau</math> is same as baseline.

Future Directions

Can be viewed as a two-mode special case of best-first viterbi search. So make a real best-first lattice search. Mark mentioned some attempts in the 80'ies to do this.

rexpanding

Helps a lot - strictly better than timeshrinking, doubles wer improvement.

looking into it:

forceAlign to subphones using timeshrunk .9 training, and reExpanded.9 dev set, to find the closest gold standard
calc likelihood given the state for each frame using original and reExpanded observations

model comparison

variance wighted by component weight, summed across all components, mixtures and features:

1mlp : 2.7010e+04
.9mlp : 2.6632e+04

So on average looks like .9mlp has slightly tighter distributions

iterative timeshrinking

iteration 0: is timeshrinking with MLP and results are above and in paper.

iteration 1:

Experiments progress chart:

TABLE CAPTION
Experiment	Generate TS Train Data	Generate TS Test Data	Force align	Fully train	test	Train just the last iteration	test	test dropping low p(o) frames
tau=.9	R	x	x	x	x	x	x	x
tau=.85	x	x	x	x	x	x	x	x
tau=.9 dropping ignoring low p(o) frames	x	x	x	x	x	x	x	x

@@ Line 1: / Line 1: @@
 ==Fisher experiments==
 {| class="wikitable"
-! <math>\tau</math> !! dev 2k utt WER !! frames dropped
+|+  number of frames dropped on Fisher corpus
+! <math>\tau</math> !! frames dropped
 |-
-| .6 || 69.3% || ~35%
+| 1 || 0%
 |-
-| .9 || xx || ~5%
+| .9 || ~5%
+|-
+| .6 || ~35%
 |}
-I have to check for bugs.  It could be that the threshold is too low or it could be something else too.
+{| class="wikitable"
+|+ Timeshrinking results on fisher
+! train <math>\tau</math> !! test <math>\tau</math> !! dev 2000 utt WER  !! dev 2000 utt on triphone single-gaussian model WER !! comments
+|-
+| 1 || 1 || [{{FisherPath}}/exp/triphone/test2kUttOnConvGaus.noUnits/config73/LATEST.log 51.6%] || || old baseline
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1/LATEST.log 53.7%] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.1.onSingleGaussian/LATEST.log 78.2%] || baseline rerun exactly as timeshrinking to really make sure it's not getting an unfair advantage
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1/LATEST.log 54.5%] || || baseline rerun exactly as timeshrinking LM_scale 16 to double check it's tuned.  Should be worse, and it is.
+|-
+| .6 || .6 || 69.3%
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.point9/LATEST.log 56.3%] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9/LATEST.log 80.4%]
+|-
+| 1 || .9 || [{{FisherPath}}/exp/timeshrink/testOnBaseline/unit.tri.timeshrink.point9/LATEST.log 53.9%]
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.point9.noTsStr/LATEST.log 57.2] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/LATEST.log 80.7%] || using the non-timeshrinking str file for test
+|-
+| .9 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.noTsStr/LATEST.log 54.6]
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp/LATEST.log 55.4%] || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.1.mlp.onSingleGaussian/LATEST.log 72.8% ] || PLP+MLP tandem
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmAdj/LATEST.log 50.6%] ||  || PLP+MLP tandem, LM scale 16
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc20/LATEST.log 49.7%] ||  || PLP+MLP tandem, LM scale 20
+|-
+| 1 || 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc25/LATEST.log 50.0%] ||  || PLP+MLP tandem, LM scale 25
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9.mlp.lmSc16/LATEST.log 49.9% ] || || PLP+MLP tandem lm_scale 16
+|-
+| .9 || .9 || [{{FisherPath}}/exp/timeshrink/test/triphoneSingleGausian/unit.tri.timeshrink.point9.mlp.lmSc20/LATEST.log 49.3% ] || || PLP+MLP tandem lm_scale 20
+|-
+|}
+I have to check for bugs.  It could be that the threshold is too low or it could be something else too.  We should probably rerun baseline too, just to make sure I didn't optimize it unfairly.
+==Things to try==
+* Test svitchboard with fisher-trained model to see if we still get good results
+* Train and test on plp+mlp, like svitchboard timeshrinking was done (done, improved baseline and test by 5% WER!).
+* Do baseline train+test to see if something changed in going from baseline to timeshrink structure files. (done, helped)
+==LM penalty and scale==
+Since we now have 62 PLP+MLP features instead of 39 PLP, we should probably change LM scale by a factor 62/39=1.58.  The original (not carefully tuned)PLP LM scale was 10.  Perhaps it would make sense to multiply the LM penalty (-1 for PLP) by the same 1.58 factor.
+==final test==
+k utterances, at tau=.9, 6.02% of the frames are dropped, 158839 segments and 3.5 frames per segment.
+lm_scale was roughly tuned on the baseline, and the same one was used on the test, although tuning for the test would help because there are %5 fewer frames per word on average.
+{| class="wikitable"
+|+ Final test Timeshrinking results on fisher
+! <math>\tau</math> !! test 20k  utt WER !! comments
+|-
+| 1 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.1.mlp.lmSc20.final/LATEST.log TE] || PLP+MLP tandem, LM scale 20 (tuned)
+|-
+| .9 || [{{FisherPath}}/exp/timeshrink/test/unit.tri.timeshrink.point9.mlp.lmSc20.final/LATEST.log TE]  || everything except <math>\tau</math> is same as baseline.
+|-
+|}
 ==Future Directions==
 * Can be viewed as a two-mode special case of best-first viterbi search.  So make a real best-first lattice search.  Mark mentioned some attempts in the 80'ies to do this.
 [[Category:Fisher Experiments]]
+== rexpanding ==
+Helps a lot - strictly better than timeshrinking, doubles wer improvement.
+looking into it:
+* forceAlign to subphones using timeshrunk .9 training, and reExpanded.9 dev set, to find the closest gold standard
+* calc likelihood given the state for each frame using original and reExpanded observations
+==model comparison==
+variance wighted by component weight, summed across all components, mixtures and features:
+mlp : 2.7010e+04
+ .9mlp : 2.6632e+04
+So on average looks like .9mlp has slightly tighter distributions
+==iterative timeshrinking==
+iteration 0: is timeshrinking with MLP and results are above and in paper.
+iteration 1:
+Experiments progress chart:
+{| class="wikitable sortable"
+|+ TABLE CAPTION
+|-
+! Experiment
+! Generate TS Train Data
+! Generate TS Test Data
+! Force align
+! Fully train
+! test
+! Train just the last iteration
+! test
+! test dropping low p(o) frames
+|-
+! tau=.9
+| <span style="color: Red">R</span>
+| x
+| x
+| x
+| x
+| x
+| x
+| x
+|-
+! tau=.85
+| x
+| x
+| x
+| x
+| x
+| x
+| x
+| x
+|-
+! tau=.9 dropping ignoring low p(o) frames
+| x
+| x
+| x
+| x
+| x
+| x
+| x
+| x
+|}

Timeshrinking

From SpeechWiki

Latest revision as of 06:27, 14 April 2010

Contents

Fisher experiments

Things to try

LM penalty and scale

final test

Future Directions

rexpanding

model comparison

iterative timeshrinking

Views

Personal tools

Navigation

Toolbox

Search