:Units Paper

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
m
Line 5: Line 5:
# Unit Selection
# Unit Selection
#*; Mistake instance
#*; Mistake instance
-
#*; Unit definition
+
#*; Unit  
-
#*; Replacement definition
+
#*; Replacement
#* Multwords
#* Multwords
# Baseline Description
# Baseline Description
#* Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
#* Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
# Results
# Results
 +
#* what to emphasize?  Ideally, units+DTs will beat just DTs for every number of components.  Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
# Conclusion
# Conclusion
-
#* consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
+
#* Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
== Tests for units paper ==
== Tests for units paper ==

Revision as of 19:40, 8 April 2009

Outline

  1. Intro
  2. Unit Selection
    • Mistake instance
      Unit
      Replacement
    • Multwords
  3. Baseline Description
    • Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
  4. Results
    • what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
  5. Conclusion
    • Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

Tests for units paper

tests to run
compPer: units: monophone states Mix: totalComp: WER Test WER Important
512 1 503 256k TR
256 1 1000 256k TR
64 1 3854 256k 49.3
64 2 2000 256k  ?
32 4 2000 256k  ?
32 2 4000 256k  ?
alternatively
256 48 137 503 127971 53.0
128 48 137 1033 131185 50.9
32 48 137 3845 122907 51.4
64 112 615 2024 ~128k TR
16 4 2000 128k  ?
16 112 615 4000 128k  ?


For Mark tomorrow

  • What is the story to tell with the above experiments? Do we need a clean replacement: 2000 leaf nodes through just DTs and also 2000 leaf nodes through DT+units?
Personal tools