Visualization Experiments
From SpeechWiki
m (concise) |
m (→Tasks) |
||
(16 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
Visualization experiments include at least the following components: | Visualization experiments include at least the following components: | ||
- | ; Visualization Interface Number 1 --- Timeline Audio | + | ; Visualization Interface Number 1 --- Timeline Audio ("[[timeliner]]") |
- | + | Lightweight, for laptops or handhelds. Similar to, and compared against, [http://audacity.sourceforge.net audacity]. | |
; Visualization Interface Number 2 --- Milliphone | ; Visualization Interface Number 2 --- Milliphone | ||
- | + | A command center interface, designed for the [http://isl.beckman.illinois.edu/Labs/CUBE/CUBE.html Cube]. | |
+ | |||
+ | Qualifying round is a tutorial. Expect 1 or 2 out of 6 recruited subjects to fail this. | ||
+ | Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram. | ||
+ | Fancier uses dsp viz (no neural net). | ||
+ | Fanciest uses neural net. | ||
; Feature Computation --- Signal Features | ; Feature Computation --- Signal Features | ||
Line 15: | Line 20: | ||
; Feature Computation --- Classification Features | ; Feature Computation --- Classification Features | ||
- | These features measure | + | These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session. |
+ | |||
+ | ==Dramatis personae== | ||
+ | |||
+ | Mark Hasegawa-Johnson | ||
+ | Camille Goudeseune | ||
+ | Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang | ||
+ | Undergrads: David Cohen | ||
+ | |||
+ | ==Tasks== | ||
+ | |||
+ | Camille: keep developing timeliner | ||
+ | done: pan, zoom (mouse scrollwheel) | ||
+ | done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista | ||
+ | done: inline C generate texturemaps | ||
+ | later: scrollweel-glut workaround for windows | ||
+ | |||
+ | Camille: keep developing milliphone, hand off to gradstudents | ||
+ | |||
+ | All: run timeliner | ||
+ | load a recorded sound | ||
+ | load precomputed features to display | ||
+ | select and play intervals | ||
+ | |||
+ | Grads: choose features, code '''feature generators''' | ||
+ | |||
+ | David: measure and model computation speed of feature generators | ||
+ | |||
+ | Camille: map features to HSV | ||
+ | |||
+ | Grads: design and pilot-study experiments | ||
+ | |||
+ | Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments | ||
+ | |||
+ | Zhen, Kai-Hsiang: September, run 5-subject experiment. | ||
+ | |||
+ | Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech. | ||
+ | |||
+ | ==Notes== | ||
+ | |||
+ | How combine features? | ||
+ | |||
+ | '''Feature generators''' read a recorded sound and write a feature file. | ||
+ | Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files. | ||
+ | Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000 | ||
+ | http://htk.eng.cam.ac.uk/ | ||
+ | Sequence of feature vectors. | ||
+ | Later: stream not batch. | ||
+ | |||
+ | Experiment tasks: | ||
+ | find instances of a class of sound events | ||
+ | find anomalous sounds (open-ended, vague) | ||
+ | |||
+ | Recorded sounds | ||
+ | AMI meeting room transcribed | ||
+ | fieldrecorder/090216 | ||
+ | fieldrecorder aircraft + webcam for ground truth | ||
+ | play freqsweep through genelec into fieldrecorder. | ||
+ | ignore clock drift. | ||
+ | Keep data files small enough for our tools. | ||
+ | toy | ||
+ | ruby script plus short audio source files generates a long target file. Tweak script while tweaking apps. | ||
+ | |||
+ | Realtime server (later) | ||
+ | record audio | ||
+ | circular buffer, a few months long | ||
+ | compute features at multiple scales | ||
+ | fast approximate algorithms for caching of features. | ||
+ | stream all this to a googlemaps-ish server | ||
+ | when client scrolls (pans) or zooms, it requests fresh data from server | ||
+ | |||
+ | ==Logistics== | ||
+ | |||
+ | Timeliner: in BI 2253 or 2nd floor printing room? PC with Ubuntu 8.1 and 4GB RAM. | ||
+ | |||
+ | Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu. |
Latest revision as of 16:32, 7 July 2010
Visualization experiments include at least the following components:
- Visualization Interface Number 1 --- Timeline Audio ("timeliner")
Lightweight, for laptops or handhelds. Similar to, and compared against, audacity.
- Visualization Interface Number 2 --- Milliphone
A command center interface, designed for the Cube.
Qualifying round is a tutorial. Expect 1 or 2 out of 6 recruited subjects to fail this. Baseline uses Audacity-style viz, i.e. peak amplitude + spectrogram. Fancier uses dsp viz (no neural net). Fanciest uses neural net.
- Feature Computation --- Signal Features
These features rapidly give an analyst information about the signal, e.g., spectrograms.
- Feature Computation --- Classification Features
These features measure how well a given classification label is matched by the signal at a given point in time (confidence score). Labels may be defined before or during a session.
Contents |
Dramatis personae
Mark Hasegawa-Johnson Camille Goudeseune Grads: Sarah Borys, Lae-Hoon Kim, Zhen Li, Kai-Hsiang Lin, Xi Zhou, Xiaodan Zhuang Undergrads: David Cohen
Tasks
Camille: keep developing timeliner
done: pan, zoom (mouse scrollwheel) done: gui using ruby-opengl, for all OSes: heron ibex leopard xp vista done: inline C generate texturemaps later: scrollweel-glut workaround for windows
Camille: keep developing milliphone, hand off to gradstudents
All: run timeliner
load a recorded sound load precomputed features to display select and play intervals
Grads: choose features, code feature generators
David: measure and model computation speed of feature generators
Camille: map features to HSV
Grads: design and pilot-study experiments
Zhen, Kai-Hsiang: recruit analyst-subjects, schedule experiments
Zhen, Kai-Hsiang: September, run 5-subject experiment.
Camille or Mark: 2010 Dec 9-10, present at FODAVA annual review, Georgia Tech.
Notes
How combine features?
Feature generators read a recorded sound and write a feature file.
Camille runs Sarah's script feat.pl, to read a 16 kHz amicorpus .wav and write .fb and .mfcc files. Format: http://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html#SECTION03271000000000000000 http://htk.eng.cam.ac.uk/ Sequence of feature vectors.
Later: stream not batch.
Experiment tasks:
find instances of a class of sound events find anomalous sounds (open-ended, vague)
Recorded sounds
AMI meeting room transcribed fieldrecorder/090216 fieldrecorder aircraft + webcam for ground truth play freqsweep through genelec into fieldrecorder. ignore clock drift. Keep data files small enough for our tools. toy ruby script plus short audio source files generates a long target file. Tweak script while tweaking apps.
Realtime server (later)
record audio circular buffer, a few months long compute features at multiple scales fast approximate algorithms for caching of features. stream all this to a googlemaps-ish server when client scrolls (pans) or zooms, it requests fresh data from server
Logistics
Timeliner: in BI 2253 or 2nd floor printing room? PC with Ubuntu 8.1 and 4GB RAM.
Camille will provide headphones, mouse with scrollwheel, extra RAM, hard disk for ubuntu.