:Units Paper

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
Line 152: Line 152:
                                   With >=  1 occurances (2699)
                                   With >=  1 occurances (2699)
   
   
 +
=== LM_PENALTY = -3 ===
 +
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
 +
 +
SENTENCE RECOGNITION PERFORMANCE
 +
 +
  sentences                                        500
 +
  with errors                            87.0%  ( 435)
 +
 +
    with substitions                      72.6%  ( 363)
 +
    with deletions                        40.6%  ( 203)
 +
    with insertions                      56.4%  ( 282)
 +
 +
 +
WORD RECOGNITION PERFORMANCE
 +
 +
Percent Total Error      =  88.9%  (4535)
 +
 +
Percent Correct          =  25.4%  (1298)
 +
 +
Percent Substitution      =  63.0%  (3215)
 +
Percent Deletions        =  11.5%  ( 588)
 +
Percent Insertions        =  14.4%  ( 732)
 +
Percent Word Accuracy    =  11.1%
 +
 +
 +
Ref. words                =          (5101)
 +
Hyp. words                =          (5245)
 +
Aligned words            =          (5833)
 +
 +
CONFUSION PAIRS                  Total                (2671)
 +
                                  With >=  1 occurances (2671)
 +
 +
'''Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.'''
== Monophone tests ==
== Monophone tests ==
Line 184: Line 217:
  Hyp. words                =          (4047)
  Hyp. words                =          (4047)
  Aligned words            =          (5387)
  Aligned words            =          (5387)
 +
 +
   
   
[[Category:Fisher Experiments]]
[[Category:Fisher Experiments]]

Revision as of 22:21, 11 April 2009

Contents

Outline

  1. Intro
  2. Unit Selection
    • Mistake instance
      Unit
      Replacement
    • Multwords
  3. Baseline Description
    • Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
  4. Results
    • what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
  5. Conclusion
    • Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

Tests for units paper

tests to run
compPer: units: monophone states Mix: totalComp: WER Test WER Important
512 1 503 256k TR
256 1 1000 256k TR
64 1 3854 256k 49.3
64 2 2000 256k  ?
32 4 2000 256k  ?
32 2 4000 256k  ?
alternatively
256 48 137 503 127971 53.0
128 48 137 1033 131185 50.9
32 48 137 3845 122907 51.4
64 112 615 2024 ~128k TR
16 4 2000 128k  ?
16 112 615 4000 128k  ?

The units make it worse

with LM_PENALTY=0

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             89.2%   ( 446)

  with substitions                      72.4%   ( 362)
  with deletions                        26.2%   ( 131)
  with insertions                       74.4%   ( 372)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   99.1%   (5107)

Percent Correct           =   28.6%   (1476)

Percent Substitution      =   66.2%   (3411)
Percent Deletions         =    5.1%   ( 265)
Percent Insertions        =   27.8%   (1431)
Percent Word Accuracy     =    0.9%


Ref. words                =           (5152)
Hyp. words                =           (6318)
Aligned words             =           (6583)

CONFUSION PAIRS                  Total                 (2790)
                                With >=  1 occurances (2790)


with LM_PENALTY=-1

test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl 
DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             88.2%   ( 441)

  with substitions                      72.8%   ( 364)
  with deletions                        32.8%   ( 164)
  with insertions                       68.6%   ( 343)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   94.6%   (4857)

Percent Correct           =   27.6%   (1417)

Percent Substitution      =   65.4%   (3358)
Percent Deletions         =    7.0%   ( 357)
Percent Insertions        =   22.3%   (1142)
Percent Word Accuracy     =    5.4%


Ref. words                =           (5132)
Hyp. words                =           (5917)
Aligned words             =           (6274)


LM_PENALTY = -2

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             87.2%   ( 436)

  with substitions                      72.8%   ( 364)
  with deletions                        38.8%   ( 194)
  with insertions                       62.8%   ( 314)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   91.4%   (4674)

Percent Correct           =   26.9%   (1374)

Percent Substitution      =   64.0%   (3271)
Percent Deletions         =    9.2%   ( 468)
Percent Insertions        =   18.3%   ( 935)
Percent Word Accuracy     =    8.6%


Ref. words                =           (5113)
Hyp. words                =           (5580)
Aligned words             =           (6048)

CONFUSION PAIRS                  Total                 (2699)
                                 With >=  1 occurances (2699)

LM_PENALTY = -3

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             87.0%   ( 435)

   with substitions                      72.6%   ( 363)
   with deletions                        40.6%   ( 203)
   with insertions                       56.4%   ( 282)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   88.9%   (4535)

Percent Correct           =   25.4%   (1298)

Percent Substitution      =   63.0%   (3215)
Percent Deletions         =   11.5%   ( 588)
Percent Insertions        =   14.4%   ( 732)
Percent Word Accuracy     =   11.1%


Ref. words                =           (5101)
Hyp. words                =           (5245)
Aligned words             =           (5833)

CONFUSION PAIRS                  Total                 (2671)
                                 With >=  1 occurances (2671)

Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.

Monophone tests

Trying to track down where the error is coming from:

Standard monophone converged once WER 86.8:

DETAILED OVERALL REPORT FOR THE SYSTEM: trTest/config19/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             83.4%   ( 417)

   with substitions                      69.4%   ( 347)
   with deletions                        52.8%   ( 264)
   with insertions                       33.6%   ( 168)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   86.8%   (4426)

Percent Correct           =   18.9%   ( 961)

Percent Substitution      =   54.9%   (2796)
Percent Deletions         =   26.3%   (1340)
Percent Insertions        =    5.7%   ( 290)
Percent Word Accuracy     =   13.2%


Ref. words                =           (5097)
Hyp. words                =           (4047)
Aligned words             =           (5387)
Personal tools