:Units Paper

(Difference between revisions)

Latest revision as of 23:01, 16 July 2010

Intro
Unit Selection
- Mistake instance
  Unit
  Replacement
- Multwords
Baseline Description
- Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
Results
- what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
Conclusion
- Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

tests to run
compPer:	units:	monophone states	Mix:	totalComp:	WER	Important
512	1		503	256k	TR
256	1		1000	256k	TR
64	1		3854	256k	49.3
64	2		2000	256k	?
32	4		2000	256k	?
32	2		4000	256k	?

alternatively

256	48	137	503	127971	53.0
128	48	137	1033	131185	50.9
32	48	137	3845	122907	51.4
64	112	615	2024	~128k	TR
16	4		2000	128k	?
16	112	615	4000	128k	?

my best clean baseline so far:

64	48	137	1033	65900	50.3%
64	48	137	1037	65900	50.6%	Finally, multi-unit with no units gives the same WER

from here onwards, we use only one quarter of the traing data to save time:

64	48	137	1061	67617	51.6%	baseline. The only difference to the above 50.6% is we are using 21% of Fisher instead of 80%. As a result the DTs also differ, but no other parmeters were changed.
128	48	137	1061	134139	49.3%	baseline, 128 gaussians, 2.3% WER improvment over 64
256	48	137	1061	251078	47.8%	baseline, 256 gaussians, 1.5 WER improvement over 128
64	117	635	1604	102321	49.8%	word-internal units, no tying among units
128	117	635	1604	202851	47.8%	word-internal units, no tying among units
64	115	635	1963	124746	50.0%	word-internal units, fullly tied units

A number of bugs have been fixed, and results on the SST:Units Paper Debugging are hopefully no longer relevant.

Finally, at least something reasonable: and (hopefully) a 2% WER improvement in the baseline So redoing monophone tests:

monophone tests
trainConfig	testConfig	Descr	States	WER	WER 2k	Comments
triUnitsModel/config28	testc67triUnitsModelc28/config67	baseline but using the new testing stuff: --maxStates 0 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0	137	84.3%	86.0%
moreDTsModel/config20	trTestTrigramFixed/config19	baseline	137	84.3%		Same as above but using older code
triUnitsModel/config27	testc67TriOnBreakdownModelc27/config67	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0	627	87.2	87.6
triUnitsModel/config26	testc67TriOnBreakdownModelc26/config67	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore	627	87.1 was 91.1%	TE
triUnitsModel/config25	xx	--maxStates 500 --unitType wordInternalOnly	636	xx	87.1
triUnitsModel/config24	xx	--maxStates 500	615	xx	xx
triUnitsModel/config29	xx	--maxStates 500 --unitType multiWordOnly	637	xx	88.2	There must be something wrong with this - the confusion pairs are not making sense and there are way too many deletions
triUnitsModel/config30	xx	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore, initializeUnitsFromFile	627	xx	86.0
triUnitsModel/config31	xx	--maxStates 500 --contextType wordInternalOnly --subUnits asBefore --tieContext none	633	xx	85.8	finally a slight improvement.

@@ Line 67: / Line 67: @@
 | ||  ||  ||  || ||  ||  ||
 |-
-|64 || 48 || 137 || 1061 || 67617 || [http://mickey.ifp.uiuc.edu/speech/akantor/fisher/exp/triphone/test2kUttOnConvGaus.noUnits/config73/LATEST.log 51.6%] ||  || baseline
+|64 || 48 || 137 || 1061 || 67617 || [http://mickey.ifp.uiuc.edu/speech/akantor/fisher/exp/triphone/test2kUttOnConvGaus.noUnits/config73/LATEST.log 51.6%] ||  || baseline.  The only difference to the above 50.6% is we are using 21% of Fisher instead of 80%.  As a result the DTs also differ, but no other parmeters were changed.
 |-
 |128 || 48 || 137 || 1061 || 134139 || [http://mickey.ifp.uiuc.edu/speech/akantor/fisher/exp/triphone/test2kUtt/convGaus.noUnits/unit.tri.128gau/LATEST.log 49.3%] ||  || baseline, 128 gaussians, 2.3% WER improvment over 64