:Units Paper Debugging

From SpeechWiki

Revision as of 19:25, 8 September 2009 by Arthur (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The units make it worse

with LM_PENALTY=0

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             89.2%   ( 446)

  with substitions                      72.4%   ( 362)
  with deletions                        26.2%   ( 131)
  with insertions                       74.4%   ( 372)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   99.1%   (5107)

Percent Correct           =   28.6%   (1476)

Percent Substitution      =   66.2%   (3411)
Percent Deletions         =    5.1%   ( 265)
Percent Insertions        =   27.8%   (1431)
Percent Word Accuracy     =    0.9%


Ref. words                =           (5152)
Hyp. words                =           (6318)
Aligned words             =           (6583)

CONFUSION PAIRS                  Total                 (2790)
                                With >=  1 occurances (2790)

with LM_PENALTY=-1

test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl 
DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             88.2%   ( 441)

  with substitions                      72.8%   ( 364)
  with deletions                        32.8%   ( 164)
  with insertions                       68.6%   ( 343)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   94.6%   (4857)

Percent Correct           =   27.6%   (1417)

Percent Substitution      =   65.4%   (3358)
Percent Deletions         =    7.0%   ( 357)
Percent Insertions        =   22.3%   (1142)
Percent Word Accuracy     =    5.4%


Ref. words                =           (5132)
Hyp. words                =           (5917)
Aligned words             =           (6274)

LM_PENALTY = -2

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             87.2%   ( 436)

  with substitions                      72.8%   ( 364)
  with deletions                        38.8%   ( 194)
  with insertions                       62.8%   ( 314)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   91.4%   (4674)

Percent Correct           =   26.9%   (1374)

Percent Substitution      =   64.0%   (3271)
Percent Deletions         =    9.2%   ( 468)
Percent Insertions        =   18.3%   ( 935)
Percent Word Accuracy     =    8.6%


Ref. words                =           (5113)
Hyp. words                =           (5580)
Aligned words             =           (6048)

CONFUSION PAIRS                  Total                 (2699)
                                 With >=  1 occurances (2699)

LM_PENALTY = -3

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             87.0%   ( 435)

   with substitions                      72.6%   ( 363)
   with deletions                        40.6%   ( 203)
   with insertions                       56.4%   ( 282)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   88.9%   (4535)

Percent Correct           =   25.4%   (1298)

Percent Substitution      =   63.0%   (3215)
Percent Deletions         =   11.5%   ( 588)
Percent Insertions        =   14.4%   ( 732)
Percent Word Accuracy     =   11.1%


Ref. words                =           (5101)
Hyp. words                =           (5245)
Aligned words             =           (5833)

CONFUSION PAIRS                  Total                 (2671)
                                 With >=  1 occurances (2671)

LM_PENALTY = -4

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.6%   ( 433)

   with substitions                      72.2%   ( 361)
   with deletions                        45.8%   ( 229)
   with insertions                       49.8%   ( 249)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.2%   (4446)

Percent Correct           =   24.5%   (1249)

Percent Substitution      =   60.4%   (3078)
Percent Deletions         =   15.1%   ( 770)
Percent Insertions        =   11.7%   ( 598)
Percent Word Accuracy     =   12.8%


Ref. words                =           (5097)
Hyp. words                =           (4925)
Aligned words             =           (5695)

CONFUSION PAIRS                  Total                 (2590)
                                 With >=  1 occurances (2590)

LM_PENALTY = -5

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.4%   ( 432)

   with substitions                      72.2%   ( 361)
   with deletions                        49.2%   ( 246)
   with insertions                       46.2%   ( 231)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.0%   (4423)

Percent Correct           =   23.1%   (1176)

Percent Substitution      =   58.9%   (2998)
Percent Deletions         =   17.9%   ( 912)
Percent Insertions        =   10.1%   ( 513)
Percent Word Accuracy     =   13.0%


Ref. words                =           (5086)
Hyp. words                =           (4687)
Aligned words             =           (5599)

CONFUSION PAIRS                  Total                 (2533)
                                 With >=  1 occurances (2533)

LM_PENALTY = -6

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             87.0%   ( 435)

   with substitions                      72.6%   ( 363)
   with deletions                        53.4%   ( 267)
   with insertions                       40.4%   ( 202)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   86.6%   (4394)

Percent Correct           =   21.5%   (1088)

Percent Substitution      =   56.8%   (2883)
Percent Deletions         =   21.7%   (1101)
Percent Insertions        =    8.1%   ( 410)
Percent Word Accuracy     =   13.4%


Ref. words                =           (5072)
Hyp. words                =           (4381)
Aligned words             =           (5482)

CONFUSION PAIRS                  Total                 (2472)
                                 With >=  1 occurances (2472)

Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.

Monophone tests

Trying to track down where the error is coming from:

Standard monophone converged once WER 86.8:

DETAILED OVERALL REPORT FOR THE SYSTEM: trTest/config19/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             83.4%   ( 417)

   with substitions                      69.4%   ( 347)
   with deletions                        52.8%   ( 264)
   with insertions                       33.6%   ( 168)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   86.8%   (4426)

Percent Correct           =   18.9%   ( 961)

Percent Substitution      =   54.9%   (2796)
Percent Deletions         =   26.3%   (1340)
Percent Insertions        =    5.7%   ( 290)
Percent Word Accuracy     =   13.2%


Ref. words                =           (5097)
Hyp. words                =           (4047)
Aligned words             =           (5387)

The Units monophone:

DETAILED OVERALL REPORT FOR THE SYSTEM: monoUnitTest/config66/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.8%   ( 434)

   with substitions                      72.2%   ( 361)
   with deletions                        43.6%   ( 218)
   with insertions                       51.2%   ( 256)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.7%   (4497)

Percent Correct           =   25.2%   (1291)

Percent Substitution      =   61.8%   (3166)
Percent Deletions         =   13.0%   ( 668)
Percent Insertions        =   12.9%   ( 663)
Percent Word Accuracy     =   12.3%


Ref. words                =           (5125)
Hyp. words                =           (5120)
Aligned words             =           (5788)

CONFUSION PAIRS                  Total                 (2569)
                                 With >=  1 occurances (2569)

So there are probably two problems,

one with monophones (more units makes things worse?!),
and with triunits (adding context does not make things better).

I will dig apart monophones first.

config 27 Units with --maxStates 500 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0 , LM_PENALTY=-1

DETAILED OVERALL REPORT FOR THE SYSTEM: testc67OnBreakdownModelc26/config67/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             88.6%   ( 443)

   with substitions                      73.8%   ( 369)
   with deletions                        36.8%   ( 184)
   with insertions                       56.0%   ( 280)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   91.1%   (4679)

Percent Correct           =   24.5%   (1260)

Percent Substitution      =   65.1%   (3340)
Percent Deletions         =   10.4%   ( 534)
Percent Insertions        =   15.7%   ( 805)
Percent Word Accuracy     =    8.9%


Ref. words                =           (5134)
Hyp. words                =           (5405)
Aligned words             =           (5939)

CONFUSION PAIRS                  Total                 (2693)

Problems fixed so far

All previous test were with the bigram model!!! all interesting tests should be rerun.
Only 1 subunit on each boundary was clustered - a disadvantage against traditional units where the center unit was also clusterd. Now up to two subunits on each boundary are untied and clustered.
many unused GMs were left in the trainable params - probably didn't affect the accuracy but slowed everything down
SUB_PHONE_COUNTER_CARD was used where WORDSTATE_COUNTER_CARD should have been used. Any word with more than 15 substates was given 0 probability?! Another reason to redo all the monophone tests.

Problems that show up only in tri-units

right-context during testing was using the training left-context DT - this was wrong since the vocab changed.