Visualization Experiments

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
(apr 10 minutes)
m (Tasks)
 
(15 intermediate revisions not shown)
Line 1: Line 1:
Visualization experiments include at least the following components:
Visualization experiments include at least the following components:
-
; Visualization Interface Number 1 --- Timeline Audio
+
; Visualization Interface Number 1 --- Timeline Audio ("[[timeliner]]")
-
This is intended to be lightweight for laptops or handhelds, similar to [http://audacity.sourceforge.net audacity].
+
Lightweight, for laptops or handhelds. Similar to, and compared against, [http://audacity.sourceforge.net audacity].
; Visualization Interface Number 2 --- Milliphone
; Visualization Interface Number 2 --- Milliphone
-
This is a command center interface, designed for the [http://www.isl.uiuc.edu/Labs/room_b650.htm Cube].
+
A command center interface, designed for the [http://isl.beckman.illinois.edu/Labs/CUBE/CUBE.html Cube].
 +
 
 +
  Qualifying round is a tutorial.  Expect 1 or 2 out of 6 recruited subjects to fail this.
 +
  Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram.
 +
  Fancier uses dsp viz (no neural net).
 +
  Fanciest uses neural net.
; Feature Computation --- Signal Features
; Feature Computation --- Signal Features
Line 15: Line 20:
; Feature Computation --- Classification Features
; Feature Computation --- Classification Features
-
These features measure the degree of match (confidence score) between the signal at any point in time, and a classification label of interest. Classification labels might be defined in advance (e.g., "explosion,"), or they might be defined by the analyst during a session.
+
These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.
-
; Minutes of 2009 Apr 10 noon meeting
+
==Dramatis personae==
-
Dramatis personae.
+
   Mark Hasegawa-Johnson
-
   Mark
+
   Camille Goudeseune
-
   Camille
+
   Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang
-
   Gradstudents Xi, Xiaodan, Sarah, Lae-Hoon
+
   Undergrads: David Cohen
-
   Undergrad David Cohen
+
-
Camille: write timeline editor
+
==Tasks==
-
   pan
+
 
-
  zoom (mouse scrollwheel)
+
Camille: keep developing timeliner
-
   gui: no widgets, just input?  then opengl suffices.
+
   done: pan, zoom (mouse scrollwheel)
-
  OS: heron ibex leopard xp vista (servers: fedora 10)
+
   done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista
 +
  done: inline C generate texturemaps
 +
  later: scrollweel-glut workaround for windows
Camille: keep developing milliphone, hand off to gradstudents
Camille: keep developing milliphone, hand off to gradstudents
-
Apps read a recorded sound and write a feature file.
+
All: run timeliner
-
  format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
+
-
  http://htk.eng.cam.ac.uk/
+
-
  Sequence of feature vectors.
+
-
Later: stream not batch.
+
-
 
+
-
All: run timeline editor
+
   load a recorded sound
   load a recorded sound
   load precomputed features to display
   load precomputed features to display
   select and play intervals
   select and play intervals
-
Grads: choose features
+
Grads: choose features, code '''feature generators'''
-
David: measure and model computation speed of features
+
David: measure and model computation speed of feature generators
Camille: map features to HSV
Camille: map features to HSV
Grads: design and pilot-study experiments
Grads: design and pilot-study experiments
-
Grads: recruit analyst-subjects, schedule experiments
+
 
-
Grads: September, run 5-subject experiment. November, present at FODAVA meeting, Richland WA
+
Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments
 +
 
 +
Zhen, Kai-Hsiang: September, run 5-subject experiment.
 +
 
 +
Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.
 +
 
 +
==Notes==
How combine features?
How combine features?
 +
 +
'''Feature generators''' read a recorded sound and write a feature file.
 +
  Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files.
 +
  Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
 +
  http://htk.eng.cam.ac.uk/
 +
  Sequence of feature vectors.
 +
Later: stream not batch.
Experiment tasks:
Experiment tasks:
Line 77: Line 90:
   stream all this to a googlemaps-ish server
   stream all this to a googlemaps-ish server
   when client scrolls (pans) or zooms, it requests fresh data from server
   when client scrolls (pans) or zooms, it requests fresh data from server
 +
 +
==Logistics==
 +
 +
Timeliner: in BI 2253 or 2nd floor printing room?  PC with Ubuntu 8.1 and 4GB RAM.
 +
 +
Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.

Latest revision as of 16:32, 7 July 2010

Visualization experiments include at least the following components:

Visualization Interface Number 1 --- Timeline Audio ("timeliner")

Lightweight, for laptops or handhelds. Similar to, and compared against, audacity.

Visualization Interface Number 2 --- Milliphone

A command center interface, designed for the Cube.

 Qualifying round is a tutorial.  Expect 1 or 2 out of 6 recruited subjects to fail this.
 Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram.
 Fancier uses dsp viz (no neural net).
 Fanciest uses neural net.
Feature Computation --- Signal Features

These features rapidly give an analyst information about the signal, e.g., spectrograms.

Feature Computation --- Classification Features

These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.

Contents

Dramatis personae

 Mark Hasegawa-Johnson
 Camille Goudeseune
 Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang
 Undergrads: David Cohen

Tasks

Camille: keep developing timeliner

 done: pan, zoom (mouse scrollwheel)
 done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista
 done: inline C generate texturemaps
 later: scrollweel-glut workaround for windows

Camille: keep developing milliphone, hand off to gradstudents

All: run timeliner

 load a recorded sound
 load precomputed features to display
 select and play intervals

Grads: choose features, code feature generators

David: measure and model computation speed of feature generators

Camille: map features to HSV

Grads: design and pilot-study experiments

Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments

Zhen, Kai-Hsiang: September, run 5-subject experiment.

Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.

Notes

How combine features?

Feature generators read a recorded sound and write a feature file.

 Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files.
 Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000
 http://htk.eng.cam.ac.uk/
 Sequence of feature vectors.

Later: stream not batch.

Experiment tasks:

 find instances of a class of sound events
 find anomalous sounds (open-ended, vague)

Recorded sounds

 AMI meeting room transcribed
 fieldrecorder/090216
 fieldrecorder aircraft + webcam for ground truth
   play freqsweep through genelec into fieldrecorder.
   ignore clock drift.
   Keep data files small enough for our tools.
 toy
   ruby script plus short audio source files generates a long target file.  Tweak script while tweaking apps.

Realtime server (later)

 record audio
 circular buffer, a few months long
 compute features at multiple scales
   fast approximate algorithms for caching of features.
 stream all this to a googlemaps-ish server
 when client scrolls (pans) or zooms, it requests fresh data from server

Logistics

Timeliner: in BI 2253 or 2nd floor printing room? PC with Ubuntu 8.1 and 4GB RAM.

Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.

Personal tools