:Units Paper Debugging

From SpeechWiki

Revision as of 19:25, 8 September 2009 by Arthur (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

The units make it worse

with LM_PENALTY=0

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             89.2%   ( 446)

  with substitions                      72.4%   ( 362)
  with deletions                        26.2%   ( 131)
  with insertions                       74.4%   ( 372)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   99.1%   (5107)

Percent Correct           =   28.6%   (1476)

Percent Substitution      =   66.2%   (3411)
Percent Deletions         =    5.1%   ( 265)
Percent Insertions        =   27.8%   (1431)
Percent Word Accuracy     =    0.9%


Ref. words                =           (5152)
Hyp. words                =           (6318)
Aligned words             =           (6583)

CONFUSION PAIRS                  Total                 (2790)
                                With >=  1 occurances (2790)


with LM_PENALTY=-1

test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl 
DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             88.2%   ( 441)

  with substitions                      72.8%   ( 364)
  with deletions                        32.8%   ( 164)
  with insertions                       68.6%   ( 343)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   94.6%   (4857)

Percent Correct           =   27.6%   (1417)

Percent Substitution      =   65.4%   (3358)
Percent Deletions         =    7.0%   ( 357)
Percent Insertions        =   22.3%   (1142)
Percent Word Accuracy     =    5.4%


Ref. words                =           (5132)
Hyp. words                =           (5917)
Aligned words             =           (6274)


LM_PENALTY = -2

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

sentences                                         500
with errors                             87.2%   ( 436)

  with substitions                      72.8%   ( 364)
  with deletions                        38.8%   ( 194)
  with insertions                       62.8%   ( 314)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   91.4%   (4674)

Percent Correct           =   26.9%   (1374)

Percent Substitution      =   64.0%   (3271)
Percent Deletions         =    9.2%   ( 468)
Percent Insertions        =   18.3%   ( 935)
Percent Word Accuracy     =    8.6%


Ref. words                =           (5113)
Hyp. words                =           (5580)
Aligned words             =           (6048)

CONFUSION PAIRS                  Total                 (2699)
                                 With >=  1 occurances (2699)

LM_PENALTY = -3

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             87.0%   ( 435)

   with substitions                      72.6%   ( 363)
   with deletions                        40.6%   ( 203)
   with insertions                       56.4%   ( 282)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   88.9%   (4535)

Percent Correct           =   25.4%   (1298)

Percent Substitution      =   63.0%   (3215)
Percent Deletions         =   11.5%   ( 588)
Percent Insertions        =   14.4%   ( 732)
Percent Word Accuracy     =   11.1%


Ref. words                =           (5101)
Hyp. words                =           (5245)
Aligned words             =           (5833)

CONFUSION PAIRS                  Total                 (2671)
                                 With >=  1 occurances (2671)

LM_PENALTY = -4

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.6%   ( 433)

   with substitions                      72.2%   ( 361)
   with deletions                        45.8%   ( 229)
   with insertions                       49.8%   ( 249)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.2%   (4446)

Percent Correct           =   24.5%   (1249)

Percent Substitution      =   60.4%   (3078)
Percent Deletions         =   15.1%   ( 770)
Percent Insertions        =   11.7%   ( 598)
Percent Word Accuracy     =   12.8%


Ref. words                =           (5097)
Hyp. words                =           (4925)
Aligned words             =           (5695)

CONFUSION PAIRS                  Total                 (2590)
                                 With >=  1 occurances (2590)


LM_PENALTY = -5

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.4%   ( 432)

   with substitions                      72.2%   ( 361)
   with deletions                        49.2%   ( 246)
   with insertions                       46.2%   ( 231)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.0%   (4423)

Percent Correct           =   23.1%   (1176)

Percent Substitution      =   58.9%   (2998)
Percent Deletions         =   17.9%   ( 912)
Percent Insertions        =   10.1%   ( 513)
Percent Word Accuracy     =   13.0%


Ref. words                =           (5086)
Hyp. words                =           (4687)
Aligned words             =           (5599)

CONFUSION PAIRS                  Total                 (2533)
                                 With >=  1 occurances (2533)

LM_PENALTY = -6

DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             87.0%   ( 435)

   with substitions                      72.6%   ( 363)
   with deletions                        53.4%   ( 267)
   with insertions                       40.4%   ( 202)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   86.6%   (4394)

Percent Correct           =   21.5%   (1088)

Percent Substitution      =   56.8%   (2883)
Percent Deletions         =   21.7%   (1101)
Percent Insertions        =    8.1%   ( 410)
Percent Word Accuracy     =   13.4%


Ref. words                =           (5072)
Hyp. words                =           (4381)
Aligned words             =           (5482)

CONFUSION PAIRS                  Total                 (2472)
                                 With >=  1 occurances (2472)


Clearly there is something wrong with the monophone model - a triunit gaussian model seems about even with monophone gaussian model - should be better by about 10% WER, I think.

Monophone tests

Trying to track down where the error is coming from:

Standard monophone converged once WER 86.8:

DETAILED OVERALL REPORT FOR THE SYSTEM: trTest/config19/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             83.4%   ( 417)

   with substitions                      69.4%   ( 347)
   with deletions                        52.8%   ( 264)
   with insertions                       33.6%   ( 168)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   86.8%   (4426)

Percent Correct           =   18.9%   ( 961)

Percent Substitution      =   54.9%   (2796)
Percent Deletions         =   26.3%   (1340)
Percent Insertions        =    5.7%   ( 290)
Percent Word Accuracy     =   13.2%


Ref. words                =           (5097)
Hyp. words                =           (4047)
Aligned words             =           (5387)


The Units monophone:

DETAILED OVERALL REPORT FOR THE SYSTEM: monoUnitTest/config66/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             86.8%   ( 434)

   with substitions                      72.2%   ( 361)
   with deletions                        43.6%   ( 218)
   with insertions                       51.2%   ( 256)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   87.7%   (4497)

Percent Correct           =   25.2%   (1291)

Percent Substitution      =   61.8%   (3166)
Percent Deletions         =   13.0%   ( 668)
Percent Insertions        =   12.9%   ( 663)
Percent Word Accuracy     =   12.3%


Ref. words                =           (5125)
Hyp. words                =           (5120)
Aligned words             =           (5788)

CONFUSION PAIRS                  Total                 (2569)
                                 With >=  1 occurances (2569)

So there are probably two problems,

  1. one with monophones (more units makes things worse?!),
  2. and with triunits (adding context does not make things better).

I will dig apart monophones first.

config 27 Units with --maxStates 500 --unitType wordInternalOnly --subUnits asBefore --growUnitSet 0 , LM_PENALTY=-1

DETAILED OVERALL REPORT FOR THE SYSTEM: testc67OnBreakdownModelc26/config67/test0/accuracy/out.nosil.trn

SENTENCE RECOGNITION PERFORMANCE

 sentences                                         500
 with errors                             88.6%   ( 443)

   with substitions                      73.8%   ( 369)
   with deletions                        36.8%   ( 184)
   with insertions                       56.0%   ( 280)


WORD RECOGNITION PERFORMANCE

Percent Total Error       =   91.1%   (4679)

Percent Correct           =   24.5%   (1260)

Percent Substitution      =   65.1%   (3340)
Percent Deletions         =   10.4%   ( 534)
Percent Insertions        =   15.7%   ( 805)
Percent Word Accuracy     =    8.9%


Ref. words                =           (5134)
Hyp. words                =           (5405)
Aligned words             =           (5939)

CONFUSION PAIRS                  Total                 (2693)
                                 

Problems fixed so far

  • All previous test were with the bigram model!!! all interesting tests should be rerun.
  • Only 1 subunit on each boundary was clustered - a disadvantage against traditional units where the center unit was also clusterd. Now up to two subunits on each boundary are untied and clustered.
  • many unused GMs were left in the trainable params - probably didn't affect the accuracy but slowed everything down
  • SUB_PHONE_COUNTER_CARD was used where WORDSTATE_COUNTER_CARD should have been used. Any word with more than 15 substates was given 0 probability?! Another reason to redo all the monophone tests.

Problems that show up only in tri-units

  • right-context during testing was using the training left-context DT - this was wrong since the vocab changed.
Personal tools