Unit Selection

From SpeechWiki

(Difference between revisions)

Revision as of 00:47, 10 February 2009

Error driven unit selection

Some statistics

Size of phoneme corpus
Utterances	2000
Phonemes (including EOW and words)	95938
Frames	756009

The probability of a particular phone occurring in the corpus, and the probability of a particular phone that is correct in the corpus

This seems roughly consistent with http://myweb.tiscali.co.uk/wordscape/wordlist/phonfreq.html.

The Confusion matrix

The confusion matrix ignoring EOW and various non-speech events

The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units.

Computational issues

At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster. That's too long. We make do with around 50000 utterances.

Syllable Units

@@ Line 20: / Line 20: @@
 The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone.  Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units.
+=== Computational issues ===
+At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster.  That's too long.  We make do with around 50000 utterances.
 ==Syllable Units==
 [[Category:Fisher Experiments]]

Unit Selection

From SpeechWiki

Revision as of 00:47, 10 February 2009

Contents

Error driven unit selection

Some statistics

Computational issues

Syllable Units

Views

Personal tools

Navigation

Toolbox

Search