Fisher Corpus
From SpeechWiki
(Difference between revisions)
(New page: The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it. =Vocabulary= {| class="wikitable" style="text-align:ce...) |
|||
Line 6: | Line 6: | ||
! total utterances | ! total utterances | ||
| 2223159 | | 2223159 | ||
+ | |- | ||
+ | ! total uncertain words or phrases enclosed in (( )) (e.g. (( NO WAY )) ) | ||
+ | | 283935 | ||
|- | |- | ||
! total word tokens in corpus | ! total word tokens in corpus |
Revision as of 21:51, 27 March 2008
The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it.
Vocabulary
total utterances | 2223159 | |
---|---|---|
total uncertain words or phrases enclosed in (( )) (e.g. (( NO WAY )) ) | 283935 | |
total word tokens in corpus | 21905137 | 100% |
total non-speech markers enclosed in [] (e.g. [LAUGH])) | 559629 | 2.555% |
total partial words (starting or ending in -) | 154130 | 0.7036% |
total unique words | 64924 | 100% |
---|---|---|
unique words occuring once in the corpus | 23192 | 35.72% |
unique words occuring once or twice in the corpus | 31272 | 48.17% |
corpus coverage if vocab does not include words occuring once in the corpus | 99.894% | |
corpus coverage if vocab does not include words occuring once or twice in the corpus | 99.857% |