Revision as of 02:08, 4 February 2009

This page links to the various things I've done with the Fisher corpus. It may be helpful for quickly building a basic speech recognizer.

Train/Devel/Test partitions

For all the models and experiments, the entire Fisher corpus into 80/10/10 percent for Train/Devel/Test partitions as follows

The utterance id file is in scliteUttIds.txt The utterance IDs are in the 'swb' format that sclite understands, and so sclite can report accuracy statistics per conversation side.

There is also another utterance Ids file uttIds.txt, which is used by some tools (only by resegment.pl, I think). It's there only for compatibility and should not be used.

Additionally, the scp files and corresponding transcriptions file pointing to the Word Aligned data are a subset of all the utterances (since some of the utterances could not be word aligned). They are partitioned differently, and are specified below.

The splits are as follows:

Set	Conversation Sides	Lines in scliteUttIds.txt	Lines in wordAlignedTranscriptions.txt and in subphoneAlignedTranscriptions.txt	Frames in subphoneAlignedTranscriptions.txt
Training	00001A to 09360B	1 to 1775831	1 to 1775773	1 to 598529978
Devel	09361A to 10530B	1775832 to 1991965	1775774 to 1991904	598529979 to 674800908
Test	10531A to 11699B	1991965 to 2223159	1991905 to 2223080	674800909 to 742529486

The experiment infrastructure needs its own page.

The experiments

The goal of these experiments is to explore the utility of using mixed units (phones, syllables and whole words) for large vocabulary speech recognition. These experiments are preformed on the Fisher Corpus.

The phonetic and mixed-unit dictionaries, the language models and the front end used in my pronunciation experiments all have their own pages.

The Fisher Baseline Experiments and Mixed Unit Experiments.

@@ Line 18: / Line 18: @@
 ! Conversation Sides
 ! Lines in scliteUttIds.txt
-! Lines in wordAlignedTranscriptions.txt
+! Lines in wordAlignedTranscriptions.txt and in subphoneAlignedTranscriptions.txt
-! Lines in subphoneAlignedTranscriptions.txt
 ! Frames in subphoneAlignedTranscriptions.txt
 |-
@@ Line 26: / Line 25: @@
 | 1 to 1775831
 | 1 to 1775773
-| 1 to 1775669
 | 1 to 598529978
 |-
@@ Line 33: / Line 31: @@
 | 1775832 to  1991965
 | 1775774 to 1991904
-| 1775670 to 1991813
 | 598529979 to 674800908
 |-
@@ Line 40: / Line 37: @@
 | 1991965 to 2223159
 | 1991905 to 2223080
-| 1991814 to 2223080
 | 674800909 to 742529486
 |}

Fisher Corpus

From SpeechWiki

Revision as of 02:08, 4 February 2009

Train/Devel/Test partitions

The experiments

Views

Personal tools

Navigation

Toolbox

Search