Unit Selection
From SpeechWiki
(Difference between revisions)
m |
m |
||
Line 20: | Line 20: | ||
The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units. | The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units. | ||
+ | |||
+ | === Computational issues === | ||
+ | At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster. That's too long. We make do with around 50000 utterances. | ||
==Syllable Units== | ==Syllable Units== | ||
[[Category:Fisher Experiments]] | [[Category:Fisher Experiments]] |
Revision as of 00:47, 10 February 2009
Contents |
Error driven unit selection
Some statistics
Utterances | 2000 |
---|---|
Phonemes (including EOW and | 95938 |
Frames | 756009 |
This seems roughly consistent with http://myweb.tiscali.co.uk/wordscape/wordlist/phonfreq.html.
The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units.
Computational issues
At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster. That's too long. We make do with around 50000 utterances.