Fisher Corpus

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
 
(8 intermediate revisions not shown)
Line 6: Line 6:
The utterance id file is in  
The utterance id file is in  
-
[http://mickey.ifp.uiuc.edu/speech/akantor/fisher/filelists/uttIds.txt uttIds.txt]
+
[http://mickey.ifp.uiuc.edu/speech/akantor/fisher/filelists/scliteUttIds.txt scliteUttIds.txt]
-
And the splits are as follows:
+
The utterance IDs are in the 'swb' format that sclite understands, and so sclite can report accuracy statistics per conversation side.
 +
There is also another utterance Ids file [http://mickey.ifp.uiuc.edu/speech/akantor/fisher/filelists/uttIds.txt uttIds.txt], which is used by some tools (only by resegment.pl, I think).  It's there only for compatibility and should not be used.
 +
 +
Additionally, the scp files and corresponding transcriptions file pointing to the Word Aligned data are a subset of all the utterances (since some of the utterances could not be word aligned).  They are partitioned differently, and are specified below.
 +
 +
The splits are as follows:
{| class="wikitable"   
{| class="wikitable"   
! Set
! Set
! Conversation Sides
! Conversation Sides
-
! Lines in uttIds.txt
+
! Lines in scliteUttIds.txt
-
! Lines in wordPerUtteranceTrans.txt
+
! Lines in wordAlignedTranscriptions.txt and in subphoneAlignedTranscriptions.txt
|-
|-
! Training
! Training
| 00001A to 09360B
| 00001A to 09360B
| 1 to 1775831
| 1 to 1775831
-
| 1 to 21718060
+
| 1 to 1775773
 +
|-
 +
! first quarter of Training
 +
| 00001A to 2340B
 +
| 1 to 465067
 +
| 1 to 465031
|-
|-
! Devel
! Devel
| 09361A to 10530B
| 09361A to 10530B
| 1775832 to  1991965
| 1775832 to  1991965
-
| 21718061 to 24482529
+
| 1775774 to 1991904
|-
|-
! Test
! Test
| 10531A to 11699B
| 10531A to 11699B
| 1991965 to 2223159
| 1991965 to 2223159
-
| 24482530 to 27071554
+
| 1991905 to 2223080
|}
|}
 +
 +
=Observation file format=
 +
The pfiles have this form
 +
 +
{| class="wikitable" 
 +
! Columns !! Category !! Data Description
 +
|-
 +
|0:38 || PLPs                                                || 13 PLPs, delta PLPs and delta-delta PLPs
 +
|-
 +
|39  || Word boundaries determined through forced alignment || word Id
 +
|-
 +
|40  ||                                                    || word Transition (0 or 1 valued)
 +
|-
 +
|41  || Timeshrinking                                      || Segment start
 +
|-
 +
|42  ||                                                    || Segment duration
 +
|-
 +
|43  ||                                                    || Representative Frame
 +
|-
 +
|44:66|| MLPs                                                || PCA_to_95_percent_variance(log(MLP activations))
 +
|-
 +
|}
Line 42: Line 74:
The [[Fisher Baseline Experiments]] and [[Mixed Unit Experiments]].
The [[Fisher Baseline Experiments]] and [[Mixed Unit Experiments]].
 +
 +
[[Category:Fisher Experiments]]

Latest revision as of 20:07, 23 September 2009

This page links to the various things I've done with the Fisher corpus. It may be helpful for quickly building a basic speech recognizer.


Train/Devel/Test partitions

For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows

The utterance id file is in scliteUttIds.txt The utterance IDs are in the 'swb' format that sclite understands, and so sclite can report accuracy statistics per conversation side.

There is also another utterance Ids file uttIds.txt, which is used by some tools (only by resegment.pl, I think). It's there only for compatibility and should not be used.

Additionally, the scp files and corresponding transcriptions file pointing to the Word Aligned data are a subset of all the utterances (since some of the utterances could not be word aligned). They are partitioned differently, and are specified below.

The splits are as follows:

Set Conversation Sides Lines in scliteUttIds.txt Lines in wordAlignedTranscriptions.txt and in subphoneAlignedTranscriptions.txt
Training 00001A to 09360B 1 to 1775831 1 to 1775773
first quarter of Training 00001A to 2340B 1 to 465067 1 to 465031
Devel 09361A to 10530B 1775832 to 1991965 1775774 to 1991904
Test 10531A to 11699B 1991965 to 2223159 1991905 to 2223080


Observation file format

The pfiles have this form

Columns Category Data Description
0:38 PLPs 13 PLPs, delta PLPs and delta-delta PLPs
39 Word boundaries determined through forced alignment word Id
40 word Transition (0 or 1 valued)
41 Timeshrinking Segment start
42 Segment duration
43 Representative Frame
44:66 MLPs PCA_to_95_percent_variance(log(MLP activations))


The experiment infrastructure needs its own page.

The experiments

The goal of these experiments is to explore the utility of using mixed units (phones, syllables and whole words) for large vocabulary speech recognition. These experiments are preformed on the Fisher Corpus.

The phonetic and mixed-unit dictionaries, the language models and the front end used in my pronunciation experiments all have their own pages.

The Fisher Baseline Experiments and Mixed Unit Experiments.

Personal tools