Unit Selection
From SpeechWiki
(Difference between revisions)
(New page: ==Error driven unit selection== ===Some statistics=== {| class="wikitable" |+Size of phoneme corpus ! Utterances | 2000 |- ! Phonemes (including EOW and <s> </s> words) | 95938 |- ! Fra...) |
|||
(5 intermediate revisions not shown) | |||
Line 12: | Line 12: | ||
! Frames | ! Frames | ||
| 756009 | | 756009 | ||
+ | |- | ||
+ | ! mistakes (contiguous strings of incorrect phones) with left/right presense of EOW as the only context) | ||
+ | | 3253 | ||
+ | |- | ||
+ | ! number of phonemes covered by mistakes in the corpus (coverage) | ||
+ | | 27097 | ||
+ | |- | ||
+ | ! converage by mistakes of length 1 | ||
+ | | 4621 | ||
+ | |- | ||
+ | ! number of mistakes with at least 5 occurrences | ||
+ | | 247 | ||
+ | |- | ||
+ | ! coverage by mistakes with at least 5 occurrences | ||
+ | | 9244 | ||
|} | |} | ||
[[Image:PhoneCorrectProb.png|thumb|none|800px|The probability of a particular phone occurring in the corpus, and the probability of a particular phone that is correct in the corpus]] | [[Image:PhoneCorrectProb.png|thumb|none|800px|The probability of a particular phone occurring in the corpus, and the probability of a particular phone that is correct in the corpus]] | ||
+ | This seems roughly consistent with http://myweb.tiscali.co.uk/wordscape/wordlist/phonfreq.html. | ||
[[Image:PhoneConfusionMatrix.png|thumb|none|800px|The Confusion matrix]] | [[Image:PhoneConfusionMatrix.png|thumb|none|800px|The Confusion matrix]] | ||
[[Image:PhoneConfusionMatrixTop40.png|thumb|none|800px| The confusion matrix ignoring EOW and various non-speech events]] | [[Image:PhoneConfusionMatrixTop40.png|thumb|none|800px| The confusion matrix ignoring EOW and various non-speech events]] | ||
+ | |||
+ | The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units. | ||
+ | |||
+ | === Computational issues === | ||
+ | At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster. That's too long. We make do with around 50000 utterances. | ||
+ | |||
+ | === Unit creation === | ||
+ | ; subword | ||
+ | : grow context, most general to most specific. Growth stops when | ||
+ | :# Number of tokens drops below some threshold | ||
+ | :# Precision raises above some threshold | ||
+ | ; multiword | ||
+ | : At the minimum, find all containing wrong multiwords subject to min token count | ||
+ | : possibly do the same context trick as for subword | ||
+ | |||
+ | The trick for choosing what to replace: Build DTs on each context phoneme. | ||
+ | |||
+ | while leaf nodes have entropy > 0: | ||
+ | For each leaf node L with entropy > 0: | ||
+ | choose context phoneme A: | ||
+ | S = the subset of examples having attribute A | ||
+ | build a shallow DT on AF of A (with a large minimum of examples in the leaf nodes), using S as data set | ||
+ | |||
+ | ^^^^^^^ This does not work. | ||
+ | |||
+ | ======= | ||
+ | M5 replacement improvement: 7032.400000 245 units | ||
+ | M0 replacement improvement: 4699.350000 3223 units | ||
+ | |||
+ | >2 improvement | ||
+ | M5 replacement improvement: 7096.050000 235 units | ||
+ | M0 replacement improvement: 3143.750000 655 units | ||
+ | |||
+ | ======= | ||
+ | no context: | ||
+ | M5 replacement improvement: 6800.850000 204 units | ||
+ | M0 replacement improvement: 4820.000000 2918 units | ||
+ | |||
+ | >2 improvement | ||
+ | M5 replacement improvement: 6963 192 units | ||
+ | M0 replacement improvement: units | ||
+ | |||
==Syllable Units== | ==Syllable Units== | ||
[[Category:Fisher Experiments]] | [[Category:Fisher Experiments]] |
Latest revision as of 17:41, 17 February 2009
Contents |
Error driven unit selection
Some statistics
Utterances | 2000 |
---|---|
Phonemes (including EOW and | 95938 |
Frames | 756009 |
mistakes (contiguous strings of incorrect phones) with left/right presense of EOW as the only context) | 3253 |
number of phonemes covered by mistakes in the corpus (coverage) | 27097 |
converage by mistakes of length 1 | 4621 |
number of mistakes with at least 5 occurrences | 247 |
coverage by mistakes with at least 5 occurrences | 9244 |
This seems roughly consistent with http://myweb.tiscali.co.uk/wordscape/wordlist/phonfreq.html.
The number of triphone decision tree leaf nodes is essentially not correlated with any of {total Frames,total Phones, error Phones, error Frames} per phone. Since the units are chosen to maximally cover the mistakes in the corpus, this might suggest that deepening the triphone DTs is not the same thing as coming up with these error-based units.
Computational issues
At 5.5 minutes per utterance, decoding the entire training corpus (1,700,000 utterances) will take around 206 days on the empty cluster. That's too long. We make do with around 50000 utterances.
Unit creation
- subword
- grow context, most general to most specific. Growth stops when
- Number of tokens drops below some threshold
- Precision raises above some threshold
- multiword
- At the minimum, find all containing wrong multiwords subject to min token count
- possibly do the same context trick as for subword
The trick for choosing what to replace: Build DTs on each context phoneme.
while leaf nodes have entropy > 0: For each leaf node L with entropy > 0: choose context phoneme A: S = the subset of examples having attribute A build a shallow DT on AF of A (with a large minimum of examples in the leaf nodes), using S as data set
^^^^^^^ This does not work.
======= M5 replacement improvement: 7032.400000 245 units M0 replacement improvement: 4699.350000 3223 units >2 improvement M5 replacement improvement: 7096.050000 235 units M0 replacement improvement: 3143.750000 655 units ======= no context: M5 replacement improvement: 6800.850000 204 units M0 replacement improvement: 4820.000000 2918 units >2 improvement M5 replacement improvement: 6963 192 units M0 replacement improvement: units