Visualization Experiments

From SpeechWiki

Visualization experiments include at least the following components:

Visualization Interface Number 1 --- Timeline Audio ("timeliner")

Lightweight, for laptops or handhelds. Similar to, and compared against, audacity.

Visualization Interface Number 2 --- Milliphone

A command center interface, designed for the Cube.

 Qualifying round is a tutorial.  Expect 1 or 2 out of 6 recruited subjects to fail this.
 Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram.
 Fancier uses dsp viz (no neural net).
 Fanciest uses neural net.

Feature Computation --- Signal Features

These features rapidly give an analyst information about the signal, e.g., spectrograms.

Feature Computation --- Classification Features

These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.

Dramatis personae

 Mark Hasegawa-Johnson
 Camille Goudeseune
 Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang
 Undergrads: David Cohen

Tasks

Camille: keep developing timeliner

 done: pan, zoom (mouse scrollwheel)
 done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista
 done: inline C generate texturemaps
 later: scrollweel-glut workaround for windows

Camille: keep developing milliphone, hand off to gradstudents

All: run timeliner

 load a recorded sound
 load precomputed features to display
 select and play intervals

Grads: choose features, code feature generators

David: measure and model computation speed of feature generators

Camille: map features to HSV

Grads: design and pilot-study experiments

Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments

Zhen, Kai-Hsiang: September, run 5-subject experiment.

Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.

Notes

How combine features?

Feature generators read a recorded sound and write a feature file.

 Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files.
 Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
 http://htk.eng.cam.ac.uk/
 Sequence of feature vectors.

Later: stream not batch.

Experiment tasks:

 find instances of a class of sound events
 find anomalous sounds (open-ended, vague)

Recorded sounds

 AMI meeting room transcribed
 fieldrecorder/090216
 fieldrecorder aircraft + webcam for ground truth
   play freqsweep through genelec into fieldrecorder.
   ignore clock drift.
   Keep data files small enough for our tools.
 toy
   ruby script plus short audio source files generates a long target file.  Tweak script while tweaking apps.

Realtime server (later)

 record audio
 circular buffer, a few months long
 compute features at multiple scales
   fast approximate algorithms for caching of features.
 stream all this to a googlemaps-ish server
 when client scrolls (pans) or zooms, it requests fresh data from server

Logistics

Timeliner: in BI 2253 or 2nd floor printing room? PC with Ubuntu 8.1 and 4GB RAM.

Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.

Visualization Experiments

From SpeechWiki

Contents

Dramatis personae

Tasks

Notes

Logistics

Views

Personal tools

Navigation

Toolbox

Search