Fisher Corpus
From SpeechWiki
(Difference between revisions)
Line 1: | Line 1: | ||
- | + | This page links to the various things I've done with the Fisher corpus. It may be helpful for quickly building a basic speech recognizer. | |
- | =Train/Devel/Test | + | |
- | + | ||
+ | =Train/Devel/Test partitions= | ||
+ | For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows | ||
The utterance id file is in | The utterance id file is in | ||
- | filelists/uttIds.txt | + | [http://mickey.ifp.uiuc.edu/speech/akantor/fisher/filelists/uttIds.txt uttIds.txt] |
And the splits are as follows: | And the splits are as follows: | ||
Line 25: | Line 27: | ||
|} | |} | ||
- | |||
- | |||
- | |||
- | |||
- | = | + | The [[experiment infrastructure]] needs its own page. |
- | [[Fisher Front End]] | + | |
+ | =The experiments= | ||
+ | |||
+ | The phonetic and mixed-unit [[Fisher Dictionaries| dictionaries]], the [[Fisher Language Model | language model]]s and the [[Fisher Front End | front end]] used in my pronunciation experiments all have their own pages. | ||
+ | |||
+ | The [[Fisher Baseline Experiments]] and [[Mixed Unit Experiments]]. |
Revision as of 18:04, 2 October 2008
This page links to the various things I've done with the Fisher corpus. It may be helpful for quickly building a basic speech recognizer.
Train/Devel/Test partitions
For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows
The utterance id file is in uttIds.txt And the splits are as follows:
Set | Conversation Sides | Lines in uttIds.txt |
---|---|---|
Training | 00001A to 09360B | 1 to 1775831 |
Devel | 09361A to 10530B | 1775832 to 1991965 |
Test | 10531A to 11699B | 1991965 to 2223159 |
The experiment infrastructure needs its own page.
The experiments
The phonetic and mixed-unit dictionaries, the language models and the front end used in my pronunciation experiments all have their own pages.