Visualization Experiments

From SpeechWiki

(Difference between revisions)

Jump to: navigation, search

Latest revision as of 16:32, 7 July 2010

Visualization experiments include at least the following components:

Visualization Interface Number 1 --- Timeline Audio ("timeliner")

Lightweight, for laptops or handhelds. Similar to, and compared against, audacity.

Visualization Interface Number 2 --- Milliphone

A command center interface, designed for the Cube.

 Qualifying round is a tutorial.  Expect 1 or 2 out of 6 recruited subjects to fail this.
 Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram.
 Fancier uses dsp viz (no neural net).
 Fanciest uses neural net.

Feature Computation --- Signal Features

These features rapidly give an analyst information about the signal, e.g., spectrograms.

Feature Computation --- Classification Features

These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.

Dramatis personae

 Mark Hasegawa-Johnson
 Camille Goudeseune
 Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang
 Undergrads: David Cohen

Tasks

Camille: keep developing timeliner

 done: pan, zoom (mouse scrollwheel)
 done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista
 done: inline C generate texturemaps
 later: scrollweel-glut workaround for windows

Camille: keep developing milliphone, hand off to gradstudents

All: run timeliner

 load a recorded sound
 load precomputed features to display
 select and play intervals

Grads: choose features, code feature generators

David: measure and model computation speed of feature generators

Camille: map features to HSV

Grads: design and pilot-study experiments

Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments

Zhen, Kai-Hsiang: September, run 5-subject experiment.

Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.

Notes

How combine features?

Feature generators read a recorded sound and write a feature file.

 Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files.
 Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
 http://htk.eng.cam.ac.uk/
 Sequence of feature vectors.

Later: stream not batch.

Experiment tasks:

 find instances of a class of sound events
 find anomalous sounds (open-ended, vague)

Recorded sounds

 AMI meeting room transcribed
 fieldrecorder/090216
 fieldrecorder aircraft + webcam for ground truth
   play freqsweep through genelec into fieldrecorder.
   ignore clock drift.
   Keep data files small enough for our tools.
 toy
   ruby script plus short audio source files generates a long target file.  Tweak script while tweaking apps.

Realtime server (later)

 record audio
 circular buffer, a few months long
 compute features at multiple scales
   fast approximate algorithms for caching of features.
 stream all this to a googlemaps-ish server
 when client scrolls (pans) or zooms, it requests fresh data from server

Logistics

Timeliner: in BI 2253 or 2nd floor printing room? PC with Ubuntu 8.1 and 4GB RAM.

Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.

Visualization Experiments

From SpeechWiki

Latest revision as of 16:32, 7 July 2010

Contents

Dramatis personae

Tasks

Notes

Logistics

Views

Personal tools

Navigation

Toolbox

Search

@@ Line 1: / Line 1: @@
 Visualization experiments include at least the following components:
-; Visualization Interface Number 1 --- Timeline Audio
+; Visualization Interface Number 1 --- Timeline Audio ("[[timeliner]]")
-This is intended to be lightweight for laptops or handhelds, similar to [http://audacity.sourceforge.net audacity].
+Lightweight, for laptops or handhelds. Similar to, and compared against, [http://audacity.sourceforge.net audacity].
 ; Visualization Interface Number 2 --- Milliphone
-This is a command center interface, designed for the [http://www.isl.uiuc.edu/Labs/room_b650.htm Cube].
+A command center interface, designed for the [http://isl.beckman.illinois.edu/Labs/CUBE/CUBE.html Cube].
+  Qualifying round is a tutorial.  Expect 1 or 2 out of 6 recruited subjects to fail this.
+  Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram.
+  Fancier uses dsp viz (no neural net).
+  Fanciest uses neural net.
 ; Feature Computation --- Signal Features
@@ Line 15: / Line 20: @@
 ; Feature Computation --- Classification Features
-These features measure the degree of match (confidence score) between the signal at any point in time, and a classification label of interest.  Classification labels might be defined in advance (e.g., "explosion,"), or they might be defined by the analyst during a session.
+These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.
+==Dramatis personae==
+  Mark Hasegawa-Johnson
+  Camille Goudeseune
+  Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang
+  Undergrads: David Cohen
+==Tasks==
+Camille: keep developing timeliner
+  done: pan, zoom (mouse scrollwheel)
+  done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista
+  done: inline C generate texturemaps
+  later: scrollweel-glut workaround for windows
+Camille: keep developing milliphone, hand off to gradstudents
+All: run timeliner
+  load a recorded sound
+  load precomputed features to display
+  select and play intervals
+Grads: choose features, code '''feature generators'''
+David: measure and model computation speed of feature generators
+Camille: map features to HSV
+Grads: design and pilot-study experiments
+Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments
+Zhen, Kai-Hsiang: September, run 5-subject experiment.
+Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.
+==Notes==
+How combine features?
+'''Feature generators''' read a recorded sound and write a feature file.
+  Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files.
+  Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
+  http://htk.eng.cam.ac.uk/
+  Sequence of feature vectors.
+Later: stream not batch.
+Experiment tasks:
+  find instances of a class of sound events
+  find anomalous sounds (open-ended, vague)
+Recorded sounds
+  AMI meeting room transcribed
+  fieldrecorder/090216
+  fieldrecorder aircraft + webcam for ground truth
+    play freqsweep through genelec into fieldrecorder.
+    ignore clock drift.
+    Keep data files small enough for our tools.
+  toy
+    ruby script plus short audio source files generates a long target file.  Tweak script while tweaking apps.
+Realtime server (later)
+  record audio
+  circular buffer, a few months long
+  compute features at multiple scales
+    fast approximate algorithms for caching of features.
+  stream all this to a googlemaps-ish server
+  when client scrolls (pans) or zooms, it requests fresh data from server
+==Logistics==
+Timeliner: in BI 2253 or 2nd floor printing room?  PC with Ubuntu 8.1 and 4GB RAM.
+Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.