:Units Paper

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
Line 62: Line 62:
 +
==Debuging==
 +
A number of bugs have been fixed, and results on the [[SST:Units Paper Debugging]] are hopefully no longer relevant.
-
==The units make it worse==
+
==Monophone tests==
-
 
+
-
=== with LM_PENALTY=0 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
sentences                                        500
+
-
with errors                            89.2%  ( 446)
+
-
+
-
  with substitions                      72.4%  ( 362)
+
-
  with deletions                        26.2%  ( 131)
+
-
  with insertions                      74.4%  ( 372)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  99.1%  (5107)
+
-
+
-
Percent Correct          =  28.6%  (1476)
+
-
+
-
Percent Substitution      =  66.2%  (3411)
+
-
Percent Deletions        =    5.1%  ( 265)
+
-
Percent Insertions        =  27.8%  (1431)
+
-
Percent Word Accuracy    =    0.9%
+
-
+
-
+
-
Ref. words                =          (5152)
+
-
Hyp. words                =          (6318)
+
-
Aligned words            =          (6583)
+
-
+
-
CONFUSION PAIRS                  Total                (2790)
+
-
                                With >=  1 occurances (2790)
+
-
 
+
-
 
+
-
=== with LM_PENALTY=-1 ===
+
-
test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            88.2%  ( 441)
+
-
+
-
  with substitions                      72.8%  ( 364)
+
-
  with deletions                        32.8%  ( 164)
+
-
  with insertions                      68.6%  ( 343)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  94.6%  (4857)
+
-
+
-
Percent Correct          =  27.6%  (1417)
+
-
+
-
Percent Substitution      =  65.4%  (3358)
+
-
Percent Deletions        =    7.0%  ( 357)
+
-
Percent Insertions        =  22.3%  (1142)
+
-
Percent Word Accuracy    =    5.4%
+
-
+
-
+
-
Ref. words                =          (5132)
+
-
Hyp. words                =          (5917)
+
-
Aligned words            =          (6274)
+
-
+
-
 
+
-
 
+
-
 
+
-
=== LM_PENALTY = -2 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
sentences                                        500
+
-
with errors                            87.2%  ( 436)
+
-
+
-
  with substitions                      72.8%  ( 364)
+
-
  with deletions                        38.8%  ( 194)
+
-
  with insertions                      62.8%  ( 314)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  91.4%  (4674)
+
-
+
-
Percent Correct          =  26.9%  (1374)
+
-
+
-
Percent Substitution      =  64.0%  (3271)
+
-
Percent Deletions        =    9.2%  ( 468)
+
-
Percent Insertions        =  18.3%  ( 935)
+
-
Percent Word Accuracy    =    8.6%
+
-
+
-
+
-
Ref. words                =          (5113)
+
-
Hyp. words                =          (5580)
+
-
Aligned words            =          (6048)
+
-
+
-
CONFUSION PAIRS                  Total                (2699)
+
-
                                  With >=  1 occurances (2699)
+
-
+
-
=== LM_PENALTY = -3 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            87.0%  ( 435)
+
-
+
-
    with substitions                      72.6%  ( 363)
+
-
    with deletions                        40.6%  ( 203)
+
-
    with insertions                      56.4%  ( 282)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  88.9%  (4535)
+
-
+
-
Percent Correct          =  25.4%  (1298)
+
-
+
-
Percent Substitution      =  63.0%  (3215)
+
-
Percent Deletions        =  11.5%  ( 588)
+
-
Percent Insertions        =  14.4%  ( 732)
+
-
Percent Word Accuracy    =  11.1%
+
-
+
-
+
-
Ref. words                =          (5101)
+
-
Hyp. words                =          (5245)
+
-
Aligned words            =          (5833)
+
-
+
-
CONFUSION PAIRS                  Total                (2671)
+
-
                                  With >=  1 occurances (2671)
+
-
+
-
===  LM_PENALTY = -4 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            86.6%  ( 433)
+
-
+
-
    with substitions                      72.2%  ( 361)
+
-
    with deletions                        45.8%  ( 229)
+
-
    with insertions                      49.8%  ( 249)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  87.2%  (4446)
+
-
+
-
Percent Correct          =  24.5%  (1249)
+
-
+
-
Percent Substitution      =  60.4%  (3078)
+
-
Percent Deletions        =  15.1%  ( 770)
+
-
Percent Insertions        =  11.7%  ( 598)
+
-
Percent Word Accuracy    =  12.8%
+
-
+
-
+
-
Ref. words                =          (5097)
+
-
Hyp. words                =          (4925)
+
-
Aligned words            =          (5695)
+
-
+
-
CONFUSION PAIRS                  Total                (2590)
+
-
                                  With >=  1 occurances (2590)
+
-
+
-
+
-
===  LM_PENALTY = -5 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            86.4%  ( 432)
+
-
+
-
    with substitions                      72.2%  ( 361)
+
-
    with deletions                        49.2%  ( 246)
+
-
    with insertions                      46.2%  ( 231)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  87.0%  (4423)
+
-
+
-
Percent Correct          =  23.1%  (1176)
+
-
+
-
Percent Substitution      =  58.9%  (2998)
+
-
Percent Deletions        =  17.9%  ( 912)
+
-
Percent Insertions        =  10.1%  ( 513)
+
-
Percent Word Accuracy    =  13.0%
+
-
+
-
+
-
Ref. words                =          (5086)
+
-
Hyp. words                =          (4687)
+
-
Aligned words            =          (5599)
+
-
+
-
CONFUSION PAIRS                  Total                (2533)
+
-
                                  With >=  1 occurances (2533)
+
-
+
-
===  LM_PENALTY = -6 ===
+
-
 
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            87.0%  ( 435)
+
-
+
-
    with substitions                      72.6%  ( 363)
+
-
    with deletions                        53.4%  ( 267)
+
-
    with insertions                      40.4%  ( 202)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  86.6%  (4394)
+
-
+
-
Percent Correct          =  21.5%  (1088)
+
-
+
-
Percent Substitution      =  56.8%  (2883)
+
-
Percent Deletions        =  21.7%  (1101)
+
-
Percent Insertions        =    8.1%  ( 410)
+
-
Percent Word Accuracy    =  13.4%
+
-
+
-
+
-
Ref. words                =          (5072)
+
-
Hyp. words                =          (4381)
+
-
Aligned words            =          (5482)
+
-
+
-
CONFUSION PAIRS                  Total                (2472)
+
-
                                  With >=  1 occurances (2472)
+
-
+
-
 
+
-
 
+
-
'''Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.'''
+
-
 
+
-
== Monophone tests ==
+
-
Trying to track down where the error is coming from:
+
-
 
+
-
Standard monophone converged once WER 86.8:
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: trTest/config19/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            83.4%  ( 417)
+
-
+
-
    with substitions                      69.4%  ( 347)
+
-
    with deletions                        52.8%  ( 264)
+
-
    with insertions                      33.6%  ( 168)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  86.8%  (4426)
+
-
+
-
Percent Correct          =  18.9%  ( 961)
+
-
+
-
Percent Substitution      =  54.9%  (2796)
+
-
Percent Deletions        =  26.3%  (1340)
+
-
Percent Insertions        =    5.7%  ( 290)
+
-
Percent Word Accuracy    =  13.2%
+
-
+
-
+
-
Ref. words                =          (5097)
+
-
Hyp. words                =          (4047)
+
-
Aligned words            =          (5387)
+
-
 
+
-
 
+
-
=== The Units monophone: ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: monoUnitTest/config66/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            86.8%  ( 434)
+
-
+
-
    with substitions                      72.2%  ( 361)
+
-
    with deletions                        43.6%  ( 218)
+
-
    with insertions                      51.2%  ( 256)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  87.7%  (4497)
+
-
+
-
Percent Correct          =  25.2%  (1291)
+
-
+
-
Percent Substitution      =  61.8%  (3166)
+
-
Percent Deletions        =  13.0%  ( 668)
+
-
Percent Insertions        =  12.9%  ( 663)
+
-
Percent Word Accuracy    =  12.3%
+
-
+
-
+
-
Ref. words                =          (5125)
+
-
Hyp. words                =          (5120)
+
-
Aligned words            =          (5788)
+
-
+
-
CONFUSION PAIRS                  Total                (2569)
+
-
                                  With >=  1 occurances (2569)
+
-
+
-
So there are probably two problems,
+
-
# one with monophones (more units makes things worse?!),
+
-
# and with triunits (adding context does not make things better).
+
-
I will dig apart monophones first.
+
-
 
+
-
=== config 27 Units with  --maxStates 500 --unitType wordInternalOnly  --subUnits asBefore --growUnitSet 0 , LM_PENALTY=-1 ===
+
-
DETAILED OVERALL REPORT FOR THE SYSTEM: testc67OnBreakdownModelc26/config67/test0/accuracy/out.nosil.trn
+
-
+
-
SENTENCE RECOGNITION PERFORMANCE
+
-
+
-
  sentences                                        500
+
-
  with errors                            88.6%  ( 443)
+
-
+
-
    with substitions                      73.8%  ( 369)
+
-
    with deletions                        36.8%  ( 184)
+
-
    with insertions                      56.0%  ( 280)
+
-
+
-
+
-
WORD RECOGNITION PERFORMANCE
+
-
+
-
Percent Total Error      =  91.1%  (4679)
+
-
+
-
Percent Correct          =  24.5%  (1260)
+
-
+
-
Percent Substitution      =  65.1%  (3340)
+
-
Percent Deletions        =  10.4%  ( 534)
+
-
Percent Insertions        =  15.7%  ( 805)
+
-
Percent Word Accuracy    =    8.9%
+
-
+
-
+
-
Ref. words                =          (5134)
+
-
Hyp. words                =          (5405)
+
-
Aligned words            =          (5939)
+
-
+
-
CONFUSION PAIRS                  Total                (2693)
+
-
                                 
+
-
 
+
-
==Problems fixed so far==
+
-
 
+
-
* All previous test were with the bigram model!!!  all interesting tests should be rerun.
+
-
* Only 1 subunit on each boundary was clustered - a disadvantage against traditional units where the center unit was also clusterd. Now up to two subunits on each boundary are untied and clustered.
+
-
* many unused GMs were left in the trainable params - probably didn't affect the accuracy but slowed everything down
+
-
* SUB_PHONE_COUNTER_CARD was used where WORDSTATE_COUNTER_CARD should have been used.  Any word with more than 15 substates was given 0 probability?!  Another reason to redo all the monophone tests.
+
-
 
+
Finally, at least something reasonable:  and (hopefully) a 2% WER improvement in the baseline
Finally, at least something reasonable:  and (hopefully) a 2% WER improvement in the baseline
So redoing monophone tests:
So redoing monophone tests:

Revision as of 04:19, 17 June 2009

Contents

Outline

  1. Intro
  2. Unit Selection
    • Mistake instance
      Unit
      Replacement
    • Multwords
  3. Baseline Description
    • Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
  4. Results
    • what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
  5. Conclusion
    • Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

Tests for units paper

tests to run
compPer: units: monophone states Mix: totalComp: WER Test WER Important
512 1 503 256k TR
256 1 1000 256k TR
64 1 3854 256k 49.3
64 2 2000 256k  ?
32 4 2000 256k  ?
32 2 4000 256k  ?
alternatively
256 48 137 503 127971 53.0
128 48 137 1033 131185 50.9
32 48 137 3845 122907 51.4
64 112 615 2024 ~128k TR
16 4 2000 128k  ?
16 112 615 4000 128k  ?
my best clean baseline so far:
64 48 137 1033 65900 50.3%
64 48 137 1037 65900 50.6% Finally, multi-unit with no units gives the same WER


Debuging

A number of bugs have been fixed, and results on the SST:Units Paper Debugging are hopefully no longer relevant.

Monophone tests

Finally, at least something reasonable: and (hopefully) a 2% WER improvement in the baseline So redoing monophone tests:

monophone tests
trainConfig testConfig Descr States WER WER 2k Comments
triUnitsModel/config28 testc67triUnitsModelc28/config67 baseline but using the new testing stuff: --maxStates 0 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0 137 84.3% 86.0%
moreDTsModel/config20 trTestTrigramFixed/config19 baseline 137 84.3% Same as above but using older code
triUnitsModel/config27 testc67TriOnBreakdownModelc27/config67 --maxStates 500 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0 627 87.2 87.6
triUnitsModel/config26 testc67TriOnBreakdownModelc26/config67 --maxStates 500 --unitType wordInternalOnly --subUnits asBefore 627 87.1 was 91.1% TE
triUnitsModel/config25 xx --maxStates 500 --unitType wordInternalOnly 636 xx 87.1
triUnitsModel/config24 xx --maxStates 500 615 xx xx
triUnitsModel/config29 xx --maxStates 500 --unitType multiWordOnly 637 xx 88.2 There must be something wrong with this - the confusion pairs are not making sense and there are way too many deletions
triUnitsModel/config30 xx --maxStates 500 --unitType wordInternalOnly --subUnits asBefore, initializeUnitsFromFile 627 xx 86.0
triUnitsModel/config31 xx --maxStates 500 --contextType wordInternalOnly --subUnits asBefore --tieContext none 633 xx 85.8 finally a slight improvement.
Personal tools