Fisher Corpus

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
(New page: The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it. =Vocabulary= {| class="wikitable" style="text-align:ce...)
Line 6: Line 6:
! total utterances  
! total utterances  
| 2223159
| 2223159
 +
|-
 +
! total uncertain words or phrases enclosed in (( )) (e.g. (( NO WAY )) )
 +
| 283935
|-
|-
! total word tokens in corpus  
! total word tokens in corpus  

Revision as of 21:51, 27 March 2008

The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it.

Vocabulary

Some Corpus Statistics
total utterances 2223159
total uncertain words or phrases enclosed in (( )) (e.g. (( NO WAY )) ) 283935
total word tokens in corpus 21905137 100%
total non-speech markers enclosed in [] (e.g. [LAUGH])) 559629 2.555%
total partial words (starting or ending in -) 154130 0.7036%
Some vocab statistics
total unique words 64924 100%
unique words occuring once in the corpus 23192 35.72%
unique words occuring once or twice in the corpus 31272 48.17%
corpus coverage if vocab does not include words occuring once in the corpus 99.894%
corpus coverage if vocab does not include words occuring once or twice in the corpus 99.857%

Language Model

Personal tools