From SpeechWiki
Outline
- Intro
- Unit Selection
- Mistake instance
- Unit
- Replacement
- Multwords
- Baseline Description
- Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
- Results
- what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
- Conclusion
- Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
Tests for units paper
tests to run
compPer: | units: | monophone states | Mix: | totalComp: | WER | Test WER | Important
|
512 | 1 | | 503 | 256k | TR | |
|
256 | 1 | | 1000 | 256k | TR | |
|
64 | 1 | | 3854 | 256k | 49.3 | |
|
64 | 2 | | 2000 | 256k | ? | |
|
32 | 4 | | 2000 | 256k | ? | |
|
32 | 2 | | 4000 | 256k | ? | |
|
| | | | | | |
|
alternatively | | | | | | |
|
| | | | | | |
|
256 | 48 | 137 | 503 | 127971 | 53.0 | |
|
128 | 48 | 137 | 1033 | 131185 | 50.9 | |
|
32 | 48 | 137 | 3845 | 122907 | 51.4 | |
|
64 | 112 | 615 | 2024 | ~128k | TR | |
|
16 | 4 | | 2000 | 128k | ? | |
|
16 | 112 | 615 | 4000 | 128k | ? | |
|
The units make it worse
with LM_PENALTY=0
DETAILED OVERALL REPORT FOR THE SYSTEM: test/config16/test0/accuracy/out.nosil.trn
SENTENCE RECOGNITION PERFORMANCE
sentences 500
with errors 89.2% ( 446)
with substitions 72.4% ( 362)
with deletions 26.2% ( 131)
with insertions 74.4% ( 372)
WORD RECOGNITION PERFORMANCE
Percent Total Error = 99.1% (5107)
Percent Correct = 28.6% (1476)
Percent Substitution = 66.2% (3411)
Percent Deletions = 5.1% ( 265)
Percent Insertions = 27.8% (1431)
Percent Word Accuracy = 0.9%
Ref. words = (5152)
Hyp. words = (6318)
Aligned words = (6583)
CONFUSION PAIRS Total (2790)
With >= 1 occurances (2790)
with LM_PENALTY=-1
test2kUtt/config16Disaster/test0/accuracy/out.nosil.trn.dtl
DETAILED OVERALL REPORT FOR THE SYSTEM: test2kUtt/config16/test0/accuracy/out.nosil.trn
SENTENCE RECOGNITION PERFORMANCE
sentences 500
with errors 88.2% ( 441)
with substitions 72.8% ( 364)
with deletions 32.8% ( 164)
with insertions 68.6% ( 343)
WORD RECOGNITION PERFORMANCE
Percent Total Error = 94.6% (4857)
Percent Correct = 27.6% (1417)
Percent Substitution = 65.4% (3358)
Percent Deletions = 7.0% ( 357)
Percent Insertions = 22.3% (1142)
Percent Word Accuracy = 5.4%
Ref. words = (5132)
Hyp. words = (5917)
Aligned words = (6274)