:Units Paper

From SpeechWiki

(Difference between revisions)

Revision as of 04:19, 17 June 2009

Outline

Intro
Unit Selection
- Mistake instance
  Unit
  Replacement
- Multwords
Baseline Description
- Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
Results
- what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
Conclusion
- Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

Tests for units paper

tests to run
compPer:	units:	monophone states	Mix:	totalComp:	WER	Important
512	1		503	256k	TR
256	1		1000	256k	TR
64	1		3854	256k	49.3
64	2		2000	256k	?
32	4		2000	256k	?
32	2		4000	256k	?

alternatively

256	48	137	503	127971	53.0
128	48	137	1033	131185	50.9
32	48	137	3845	122907	51.4
64	112	615	2024	~128k	TR
16	4		2000	128k	?
16	112	615	4000	128k	?

my best clean baseline so far:

64	48	137	1033	65900	50.3%
64	48	137	1037	65900	50.6%	Finally, multi-unit with no units gives the same WER

Debuging

A number of bugs have been fixed, and results on the SST:Units Paper Debugging are hopefully no longer relevant.

Monophone tests

Finally, at least something reasonable: and (hopefully) a 2% WER improvement in the baseline So redoing monophone tests:

monophone tests
trainConfig	testConfig	Descr	States	WER	WER 2k	Comments
triUnitsModel/config28	testc67triUnitsModelc28/config67	baseline but using the new testing stuff: --maxStates 0 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0	137	84.3%	86.0%
moreDTsModel/config20	trTestTrigramFixed/config19	baseline	137	84.3%		Same as above but using older code
triUnitsModel/config27	testc67TriOnBreakdownModelc27/config67	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0	627	87.2	87.6
triUnitsModel/config26	testc67TriOnBreakdownModelc26/config67	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore	627	87.1 was 91.1%	TE
triUnitsModel/config25	xx	--maxStates 500 --unitType wordInternalOnly	636	xx	87.1
triUnitsModel/config24	xx	--maxStates 500	615	xx	xx
triUnitsModel/config29	xx	--maxStates 500 --unitType multiWordOnly	637	xx	88.2	There must be something wrong with this - the confusion pairs are not making sense and there are way too many deletions
triUnitsModel/config30	xx	--maxStates 500 --unitType wordInternalOnly --subUnits asBefore, initializeUnitsFromFile	627	xx	86.0
triUnitsModel/config31	xx	--maxStates 500 --contextType wordInternalOnly --subUnits asBefore --tieContext none	633	xx	85.8	finally a slight improvement.

:Units Paper

From SpeechWiki

Revision as of 04:19, 17 June 2009

Contents

Outline

Tests for units paper

Debuging

Monophone tests

Views

Personal tools

Navigation

Toolbox

Search

@@ Line 62: / Line 62: @@
+==Debuging==
+A number of bugs have been fixed, and results on the [[SST:Units Paper Debugging]] are hopefully no longer relevant.
-==The units make it worse==
+==Monophone tests==
-=== with LM_PENALTY=0 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
- sentences                                         500
- with errors                             89.2%   ( 446)
-   with substitions                      72.4%   ( 362)
-   with deletions                        26.2%   ( 131)
-   with insertions                       74.4%   ( 372)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   99.1%   (5107)
- Percent Correct           =   28.6%   (1476)
- Percent Substitution      =   66.2%   (3411)
- Percent Deletions         =    5.1%   ( 265)
- Percent Insertions        =   27.8%   (1431)
- Percent Word Accuracy     =    0.9%
- Ref. words                =           (5152)
- Hyp. words                =           (6318)
- Aligned words             =           (6583)
- CONFUSION PAIRS                  Total                 (2790)
-                                 With >=  1 occurances (2790)
-=== with LM_PENALTY=-1 ===
- test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl
- DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             88.2%   ( 441)
-   with substitions                      72.8%   ( 364)
-   with deletions                        32.8%   ( 164)
-   with insertions                       68.6%   ( 343)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   94.6%   (4857)
- Percent Correct           =   27.6%   (1417)
- Percent Substitution      =   65.4%   (3358)
- Percent Deletions         =    7.0%   ( 357)
- Percent Insertions        =   22.3%   (1142)
- Percent Word Accuracy     =    5.4%
- Ref. words                =           (5132)
- Hyp. words                =           (5917)
- Aligned words             =           (6274)
-=== LM_PENALTY = -2 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
- sentences                                         500
- with errors                             87.2%   ( 436)
-   with substitions                      72.8%   ( 364)
-   with deletions                        38.8%   ( 194)
-   with insertions                       62.8%   ( 314)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   91.4%   (4674)
- Percent Correct           =   26.9%   (1374)
- Percent Substitution      =   64.0%   (3271)
- Percent Deletions         =    9.2%   ( 468)
- Percent Insertions        =   18.3%   ( 935)
- Percent Word Accuracy     =    8.6%
- Ref. words                =           (5113)
- Hyp. words                =           (5580)
- Aligned words             =           (6048)
- CONFUSION PAIRS                  Total                 (2699)
-                                  With >=  1 occurances (2699)
-=== LM_PENALTY = -3 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             87.0%   ( 435)
-    with substitions                      72.6%   ( 363)
-    with deletions                        40.6%   ( 203)
-    with insertions                       56.4%   ( 282)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   88.9%   (4535)
- Percent Correct           =   25.4%   (1298)
- Percent Substitution      =   63.0%   (3215)
- Percent Deletions         =   11.5%   ( 588)
- Percent Insertions        =   14.4%   ( 732)
- Percent Word Accuracy     =   11.1%
- Ref. words                =           (5101)
- Hyp. words                =           (5245)
- Aligned words             =           (5833)
- CONFUSION PAIRS                  Total                 (2671)
-                                  With >=  1 occurances (2671)
-===  LM_PENALTY = -4 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             86.6%   ( 433)
-    with substitions                      72.2%   ( 361)
-    with deletions                        45.8%   ( 229)
-    with insertions                       49.8%   ( 249)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   87.2%   (4446)
- Percent Correct           =   24.5%   (1249)
- Percent Substitution      =   60.4%   (3078)
- Percent Deletions         =   15.1%   ( 770)
- Percent Insertions        =   11.7%   ( 598)
- Percent Word Accuracy     =   12.8%
- Ref. words                =           (5097)
- Hyp. words                =           (4925)
- Aligned words             =           (5695)
- CONFUSION PAIRS                  Total                 (2590)
-                                  With >=  1 occurances (2590)
-===  LM_PENALTY = -5 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             86.4%   ( 432)
-    with substitions                      72.2%   ( 361)
-    with deletions                        49.2%   ( 246)
-    with insertions                       46.2%   ( 231)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   87.0%   (4423)
- Percent Correct           =   23.1%   (1176)
- Percent Substitution      =   58.9%   (2998)
- Percent Deletions         =   17.9%   ( 912)
- Percent Insertions        =   10.1%   ( 513)
- Percent Word Accuracy     =   13.0%
- Ref. words                =           (5086)
- Hyp. words                =           (4687)
- Aligned words             =           (5599)
- CONFUSION PAIRS                  Total                 (2533)
-                                  With >=  1 occurances (2533)
-===  LM_PENALTY = -6 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             87.0%   ( 435)
-    with substitions                      72.6%   ( 363)
-    with deletions                        53.4%   ( 267)
-    with insertions                       40.4%   ( 202)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   86.6%   (4394)
- Percent Correct           =   21.5%   (1088)
- Percent Substitution      =   56.8%   (2883)
- Percent Deletions         =   21.7%   (1101)
- Percent Insertions        =    8.1%   ( 410)
- Percent Word Accuracy     =   13.4%
- Ref. words                =           (5072)
- Hyp. words                =           (4381)
- Aligned words             =           (5482)
- CONFUSION PAIRS                  Total                 (2472)
-                                  With >=  1 occurances (2472)
-'''Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.'''
-== Monophone tests ==
-Trying to track down where the error is coming from:
-Standard monophone converged once WER 86.8:
- DETAILED OVERALL REPORT FOR THE SYSTEM: trTest/config19/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             83.4%   ( 417)
-    with substitions                      69.4%   ( 347)
-    with deletions                        52.8%   ( 264)
-    with insertions                       33.6%   ( 168)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   86.8%   (4426)
- Percent Correct           =   18.9%   ( 961)
- Percent Substitution      =   54.9%   (2796)
- Percent Deletions         =   26.3%   (1340)
- Percent Insertions        =    5.7%   ( 290)
- Percent Word Accuracy     =   13.2%
- Ref. words                =           (5097)
- Hyp. words                =           (4047)
- Aligned words             =           (5387)
-=== The Units monophone: ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: monoUnitTest/config66/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             86.8%   ( 434)
-    with substitions                      72.2%   ( 361)
-    with deletions                        43.6%   ( 218)
-    with insertions                       51.2%   ( 256)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   87.7%   (4497)
- Percent Correct           =   25.2%   (1291)
- Percent Substitution      =   61.8%   (3166)
- Percent Deletions         =   13.0%   ( 668)
- Percent Insertions        =   12.9%   ( 663)
- Percent Word Accuracy     =   12.3%
- Ref. words                =           (5125)
- Hyp. words                =           (5120)
- Aligned words             =           (5788)
- CONFUSION PAIRS                  Total                 (2569)
-                                  With >=  1 occurances (2569)
-So there are probably two problems,
-# one with monophones (more units makes things worse?!),
-# and with triunits (adding context does not make things better).
-I will dig apart monophones first.
-=== config 27 Units with  --maxStates 500 --unitType wordInternalOnly  --subUnits asBefore --growUnitSet 0 , LM_PENALTY=-1 ===
- DETAILED OVERALL REPORT FOR THE SYSTEM: testc67OnBreakdownModelc26/config67/test0/accuracy/out.nosil.trn
- SENTENCE RECOGNITION PERFORMANCE
-  sentences                                         500
-  with errors                             88.6%   ( 443)
-    with substitions                      73.8%   ( 369)
-    with deletions                        36.8%   ( 184)
-    with insertions                       56.0%   ( 280)
- WORD RECOGNITION PERFORMANCE
- Percent Total Error       =   91.1%   (4679)
- Percent Correct           =   24.5%   (1260)
- Percent Substitution      =   65.1%   (3340)
- Percent Deletions         =   10.4%   ( 534)
- Percent Insertions        =   15.7%   ( 805)
- Percent Word Accuracy     =    8.9%
- Ref. words                =           (5134)
- Hyp. words                =           (5405)
- Aligned words             =           (5939)
- CONFUSION PAIRS                  Total                 (2693)
-==Problems fixed so far==
-* All previous test were with the bigram model!!!  all interesting tests should be rerun.
-* Only 1 subunit on each boundary was clustered - a disadvantage against traditional units where the center unit was also clusterd. Now up to two subunits on each boundary are untied and clustered.
-* many unused GMs were left in the trainable params - probably didn't affect the accuracy but slowed everything down
-* SUB_PHONE_COUNTER_CARD was used where WORDSTATE_COUNTER_CARD should have been used.  Any word with more than 15 substates was given 0 probability?!  Another reason to redo all the monophone tests.
 Finally, at least something reasonable:  and (hopefully) a 2% WER improvement in the baseline
 So redoing monophone tests: