:Units Paper

(Difference between revisions)

Revision as of 19:40, 8 April 2009

Intro
Unit Selection
- Mistake instance
  Unit
  Replacement
- Multwords
Baseline Description
- Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
Results
- what to emphasize? Ideally, units+DTs will beat just DTs for every number of components. Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
Conclusion
- Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).

tests to run
compPer:	units:	monophone states	Mix:	totalComp:	WER
512	1		503	256k	TR
256	1		1000	256k	TR
64	1		3854	256k	49.3
64	2		2000	256k	?
32	4		2000	256k	?
32	2		4000	256k	?

alternatively

256	48	137	503	127971	53.0
128	48	137	1033	131185	50.9
32	48	137	3845	122907	51.4
64	112	615	2024	~128k	TR
16	4		2000	128k	?
16	112	615	4000	128k	?

What is the story to tell with the above experiments? Do we need a clean replacement: 2000 leaf nodes through just DTs and also 2000 leaf nodes through DT+units?

@@ Line 5: / Line 5: @@
 # Unit Selection
 #*; Mistake instance
-#*; Unit definition
+#*; Unit
-#*; Replacement definition
+#*; Replacement
 #* Multwords
 # Baseline Description
 #* Vocab: single most frequent pronunciation from a multi-pronunciation dictionary (better than multi-pronunciation)
 # Results
+#* what to emphasize?  Ideally, units+DTs will beat just DTs for every number of components.  Even if we cannot grow the components until improvement bottoms out, at least there will be a trend.
 # Conclusion
-#* consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
+#* Future work: consider context during unit selection (right now the unit is context-free - the same unit appearing in all contexts where replacements took place).
 == Tests for units paper ==