Projects

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
(smaragdis: class for sound input)
 
(38 intermediate revisions not shown)
Line 1: Line 1:
-
Here are some of the projects that [[SST People]] are working on
+
Here are some projects that [[SST People]] are working on.  For another view, see our [http://www.isle.uiuc.edu/pubs Publications].
 +
 
 +
===SST Group Meetings===
 +
 
 +
* [[SST Group Meetings]]
 +
 
 +
===Phonetics, Phonology, Semantics===
 +
 
 +
; Prosody and Phonology in Automatic Speech Recognition (Landmark-Based Speech Recognition)
 +
: [[landmarks09F| Group Meeting Schedules and Slides]]
 +
: [http://www.isle.uiuc.edu/research/landmarks.html Landmark-Based Speech Recognition]
 +
: [http://www.isle.uiuc.edu/research/prosody_of_disfluency.html Prosody of Disfluency]
; Very Large Corpus ASR/ Mixed-Units ASR
; Very Large Corpus ASR/ Mixed-Units ASR
-
: Large Vocabulary speech recognition using [[:Category:Fisher_Experiments| mixed units on fisher corpus]]  
+
: [[:Category:Fisher_Experiments|Large Vocabulary speech recognition using mixed units on fisher corpus]]  
-
: ([http://www.isle.uiuc.edu/~akantor/ Arthur])
+
-
; Acoustic Events
+
; [[articulatory_feature_transcription|Articulatory Feature Transcription]]
-
: [[compaudition09| Group Meeting Schedules and Slides]]
+
: [[Transcription_Guidelines|Transcription Guidelines]]
 +
: [[Phone-to-Feature_Mapping|Phone-to-Feature Mapping]]
 +
: [[Meeting_Summaries|Meeting Summaries]]
 +
: [[Resources|Resources]]
-
; Dynamics of Second Language Fluency
+
=== Group dynamics and Discourse ===
-
: [[http://serrano.ai.uiuc.edu/CRI/ Group Meeting Schedules and Slides]]
+
-
: [http://www.isle.uiuc.edu/research/fluency.html Fluency Project Description]
+
-
; Groupscope --- Dynamics of Medium-Sized Groups
+
; GroupScope --- Dynamics of Medium-Sized Groups
-
: [[groupscope09| Group Meeting Schedules and Slides]]
+
: [[GroupScope]]
-
; Landmark-Based Speech Recognition
+
===Language Acquisition, Language Contact, Variability, and Disability===
-
: [[landmarks09| Group Meeting Schedules and Slides]]
+
-
: [http://www.isle.uiuc.edu/research/landmarks.html Landmark-Based Speech Recognition Project Desription]
+
-
: [http://www.isle.uiuc.edu/research/prosody_of_disfluency.html Prosody of Disfluency Project Description]
+
-
; Multi-Language Audio Search
+
; Multi-Dialect Speech Recognition and Machine Translation for Qatari Broadcast TV
-
: [http://hlt.i2r.a-star.edu.sg/starchallenge The Star Challenge] competition home page
+
: [[Multi Dialect Arabic]]
-
: ([http://netfiles.uiuc.edu/xzhuang2/www/ Xiaodan], [http://www.isle.uiuc.edu/~hsharma Harsh], and others)
+
 
 +
; Cross-Language Transfer Learning
 +
: [[Linguistic Diversity References]]
 +
: [http://hlt.i2r.a-star.edu.sg/starchallenge Star Challenge competition]
 +
 
 +
; Dynamics of Second Language Fluency
 +
: [http://serrano.ai.uiuc.edu/CRI/ Group Meeting Schedules and Slides]
 +
: [http://www.isle.uiuc.edu/research/fluency.html Description]
 +
: [[Dynamics of Second Language Fluency Data Description|Data Description]]
; Universal Access
; Universal Access
: [[dysarthria09|Group Meeting Schedules and Slides]]
: [[dysarthria09|Group Meeting Schedules and Slides]]
 +
: [http://www.isle.uiuc.edu/ua/index.html Description]
: [http://www.isle.uiuc.edu/UASpeech UASpeech Database]
: [http://www.isle.uiuc.edu/UASpeech UASpeech Database]
-
: [http://www.isle.uiuc.edu/ua/index.html UA Project Description]
+
 
 +
===Multimodal Fusion, Speech and Non-Speech===
 +
 
 +
; Audiovisual Event Detection and Visualization
 +
: [[compaudition09| Group Meeting Schedules and Slides]]
 +
: [[acoustic_events_papers| Papers]]
 +
: [[Visualization Experiments]]
 +
 
 +
; Mobile Platform Acoustic-Frequency Environmental Tomography (was Dereverberation)
 +
: [[compaudition09| Group Meeting Schedules]]
 +
: [[Dereverberation Project| Project Status and Working Notes]]
; Audiovisual Speech Recognition
; Audiovisual Speech Recognition
-
: [http://www.isle.uiuc.edu/research/audiovisual.html AVSR Project Description]
+
: [http://www.isle.uiuc.edu/research/audiovisual.html Description]
: [http://www.isle.uiuc.edu/AVICAR/ AVICAR Database]
: [http://www.isle.uiuc.edu/AVICAR/ AVICAR Database]
-
[http://www.isle.uiuc.edu/pubs Publications] | [http://www.isle.uiuc.edu/sst.html Group Web Page]
+
; Smaragdis collaboration
 +
: [[Image:Smaragdis-130218.jpg]]
 +
: [[Image:Smaragdis-130311.jpg]]
 +
 
 +
Pseudocode spec for the sound input class
 +
(and also output later, but not read-and-write):
 +
 
 +
class input_t{
 +
// Definition of stream characteristics
 +
class specs_t{
 +
size_t channels;
 +
double sample_rate;
 +
enum sample_format;
 +
};
 +
 
 +
//
 +
// Constructors
 +
//
 +
 
 +
input_t( ??? stream, bool in_or_out, size_t ch, double sr, enum frm)
 +
{
 +
switch( stream){
 +
case "file"
 +
use ffmpeg
 +
case "socket"
 +
use homebrew code?
 +
case "url"
 +
use VLC?
 +
case "adc"
 +
Use portaudio
 +
case "dac"
 +
Use portaudio
 +
}
 +
}
 +
 
 +
input_t( ??? stream, input_t example); // copy stream attributes
 +
input_t( ??? stream, input_t::specs_t example); // copy stream attributes
 +
 
 +
Assignment/copy operators
 +
 
 +
//
 +
// Destructor
 +
//
 +
 
 +
~input_t() // bookkeeping with closing file/net/etc.
 +
 
 +
//
 +
// Utilities
 +
//
 +
 
 +
double sample_rate();
 +
size_t channels();
 +
enum sample_format();
 +
bool eof();
 +
bool();
 +
 
 +
//
 +
// Seeking
 +
//
 +
 
 +
seek( size_t s); // move to sample frame s
 +
seek( double t); // move to second t
 +
 
 +
//
 +
// Reading
 +
// output should be channels by sample frames
 +
 
 +
array<T> &read( size_t n, size_t offset, int channel_mask); // sample frames
 +
array<T> &read( double n, double offset, int channel_mask); // seconds
 +
 
 +
//
 +
// Writing
 +
//
 +
 
 +
write( array<T> &x, size_t offset, int channel_mask); // sample frames
 +
write( array<T> &x, double offset, int channel_mask); // seconds
 +
 
 +
write_add( array<T> &x, size_t offset, int channel_mask); // sample frames
 +
write_add( array<T> &x, double offset, int channel_mask); // seconds
 +
};
 +
 
 +
We are going for a blocking interface instead of cumbersome callbacks for now.  The stream parameters when reading can be used to perform
 +
+on the fly resampling and channel remapping.  I'm attaching the board doodling in case I missed something.
 +
 
 +
We are currently working on the getting code to work for the simple case:
 +
 
 +
main()
 +
{
 +
input_t in( ...);
 +
 
 +
while( in){
 +
x = in.read( ...);
 +
y = feature( x);
 +
plot( y);
 +
}
 +
}
 +
 
 +
I'm working on the feature object, Camille is working on the input object.
 +
 
 +
==See also==
 +
* [http://www.isle.illinois.edu/sst/pubs/ SST publications]
 +
* [http://www.isle.illinois.edu/sst/ SST group web page]
 +
* [[Special:Upload]]

Latest revision as of 23:06, 11 March 2013

Here are some projects that SST People are working on. For another view, see our Publications.

Contents

SST Group Meetings

Phonetics, Phonology, Semantics

Prosody and Phonology in Automatic Speech Recognition (Landmark-Based Speech Recognition)
Group Meeting Schedules and Slides
Landmark-Based Speech Recognition
Prosody of Disfluency
Very Large Corpus ASR/ Mixed-Units ASR
Large Vocabulary speech recognition using mixed units on fisher corpus
Articulatory Feature Transcription
Transcription Guidelines
Phone-to-Feature Mapping
Meeting Summaries
Resources

Group dynamics and Discourse

GroupScope --- Dynamics of Medium-Sized Groups
GroupScope

Language Acquisition, Language Contact, Variability, and Disability

Multi-Dialect Speech Recognition and Machine Translation for Qatari Broadcast TV
Multi Dialect Arabic
Cross-Language Transfer Learning
Linguistic Diversity References
Star Challenge competition
Dynamics of Second Language Fluency
Group Meeting Schedules and Slides
Description
Data Description
Universal Access
Group Meeting Schedules and Slides
Description
UASpeech Database

Multimodal Fusion, Speech and Non-Speech

Audiovisual Event Detection and Visualization
Group Meeting Schedules and Slides
Papers
Visualization Experiments
Mobile Platform Acoustic-Frequency Environmental Tomography (was Dereverberation)
Group Meeting Schedules
Project Status and Working Notes
Audiovisual Speech Recognition
Description
AVICAR Database
Smaragdis collaboration
Smaragdis-130218.jpg
Smaragdis-130311.jpg

Pseudocode spec for the sound input class (and also output later, but not read-and-write):

class input_t{ // Definition of stream characteristics class specs_t{ size_t channels; double sample_rate; enum sample_format; };

// // Constructors //

input_t( ??? stream, bool in_or_out, size_t ch, double sr, enum frm) { switch( stream){ case "file" use ffmpeg case "socket" use homebrew code? case "url" use VLC? case "adc" Use portaudio case "dac" Use portaudio } }

input_t( ??? stream, input_t example); // copy stream attributes input_t( ??? stream, input_t::specs_t example); // copy stream attributes

Assignment/copy operators

// // Destructor //

~input_t() // bookkeeping with closing file/net/etc.

// // Utilities //

double sample_rate(); size_t channels(); enum sample_format(); bool eof(); bool();

// // Seeking //

seek( size_t s); // move to sample frame s seek( double t); // move to second t

// // Reading // output should be channels by sample frames

array<T> &read( size_t n, size_t offset, int channel_mask); // sample frames array<T> &read( double n, double offset, int channel_mask); // seconds

// // Writing //

write( array<T> &x, size_t offset, int channel_mask); // sample frames write( array<T> &x, double offset, int channel_mask); // seconds

write_add( array<T> &x, size_t offset, int channel_mask); // sample frames write_add( array<T> &x, double offset, int channel_mask); // seconds };

We are going for a blocking interface instead of cumbersome callbacks for now. The stream parameters when reading can be used to perform +on the fly resampling and channel remapping. I'm attaching the board doodling in case I missed something.

We are currently working on the getting code to work for the simple case:

main() { input_t in( ...);

while( in){ x = in.read( ...); y = feature( x); plot( y); } }

I'm working on the feature object, Camille is working on the input object.

See also

Personal tools