Fisher Corpus

From SpeechWiki

Revision as of 00:52, 26 March 2008 by Arthur (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it.

Vocabulary

Some Corpus Statistics
total utterances 2223159
total word tokens in corpus 21905137 100%
total non-speech markers enclosed in [] (e.g. [LAUGH])) 559629 2.555%
total partial words (starting or ending in -) 154130 0.7036%
Some vocab statistics
total unique words 64924 100%
unique words occuring once in the corpus 23192 35.72%
unique words occuring once or twice in the corpus 31272 48.17%
corpus coverage if vocab does not include words occuring once in the corpus 99.894%
corpus coverage if vocab does not include words occuring once or twice in the corpus 99.857%

Language Model

Personal tools