:Units Paper
From SpeechWiki
(Difference between revisions)
m |
|||
Line 5: | Line 5: | ||
# Unit Selection | # Unit Selection | ||
#*; Mistake instance | #*; Mistake instance | ||
- | #*; Unit | + | #*; Unit |
- | #*; Replacement | + | #*; Replacement |
#* Multwords | #* Multwords | ||
# Baseline Description | # Baseline Description | ||
#* Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation) | #* Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation) | ||
# Results | # Results | ||
+ | #* what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend. | ||
# Conclusion | # Conclusion | ||
- | #* consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place). | + | #* Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place). |
== Tests for units paper == | == Tests for units paper == |
Revision as of 19:40, 8 April 2009
Outline
- Intro
- Unit Selection
- Mistake instance
- Unit
- Replacement
- Multwords
- Baseline Description
- Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
- Results
- what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
- Conclusion
- Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
Tests for units paper
compPer: | units: | monophone states | Mix: | totalComp: | WER | Test WER | Important |
---|---|---|---|---|---|---|---|
512 | 1 | 503 | 256k | TR | |||
256 | 1 | 1000 | 256k | TR | |||
64 | 1 | 3854 | 256k | 49.3 | |||
64 | 2 | 2000 | 256k | ? | |||
32 | 4 | 2000 | 256k | ? | |||
32 | 2 | 4000 | 256k | ? | |||
alternatively | |||||||
256 | 48 | 137 | 503 | 127971 | 53.0 | ||
128 | 48 | 137 | 1033 | 131185 | 50.9 | ||
32 | 48 | 137 | 3845 | 122907 | 51.4 | ||
64 | 112 | 615 | 2024 | ~128k | TR | ||
16 | 4 | 2000 | 128k | ? | |||
16 | 112 | 615 | 4000 | 128k | ? |
For Mark tomorrow
- What is the story to tell with the above experiments? Do we need a clean replacement: 2000 leaf nodes through just DTs and also 2000 leaf nodes through DT+units?