Fisher Corpus

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
The fisher corpus is still relatively new and rough, and this page is to help people quickly build a basic speech recognizer with it.
+
This page links to the various things I've done with the Fisher corpus. It may be  helpful for quickly building a basic speech recognizer.
-
=Train/Devel/Test partition=
+
 
-
I've split the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions
+
 
 +
=Train/Devel/Test partitions=
 +
For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows
The utterance id file is in  
The utterance id file is in  
-
filelists/uttIds.txt
+
[http://mickey.ifp.uiuc.edu/speech/akantor/fisher/filelists/uttIds.txt uttIds.txt]
And the splits are as follows:
And the splits are as follows:
Line 25: Line 27:
|}
|}
-
=Dictionaries=
 
-
[[Fisher Dictionaries]]
 
-
=Language Model=
 
-
There is a lot to say about the [[Fisher Language Model]]s so they get their own page.
 
-
=Front End=
+
The [[experiment infrastructure]] needs its own page.
-
[[Fisher Front End]]
+
 
 +
=The experiments=
 +
 
 +
The phonetic and mixed-unit [[Fisher Dictionaries| dictionaries]], the [[Fisher Language Model | language model]]s and the [[Fisher Front End | front end]] used in my pronunciation experiments all have their own pages.
 +
 
 +
The [[Fisher Baseline Experiments]] and [[Mixed Unit Experiments]].

Revision as of 18:04, 2 October 2008

This page links to the various things I've done with the Fisher corpus. It may be helpful for quickly building a basic speech recognizer.


Train/Devel/Test partitions

For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows

The utterance id file is in uttIds.txt And the splits are as follows:

Set Conversation Sides Lines in uttIds.txt
Training 00001A to 09360B 1 to 1775831
Devel 09361A to 10530B 1775832 to 1991965
Test 10531A to 11699B 1991965 to 2223159


The experiment infrastructure needs its own page.

The experiments

The phonetic and mixed-unit dictionaries, the language models and the front end used in my pronunciation experiments all have their own pages.

The Fisher Baseline Experiments and Mixed Unit Experiments.

Personal tools