Mobile Platform Acoustic-Frequency Environmental Tomography

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
m (hardware prototypes)
(Robot)
Line 40: Line 40:
===Robot===
===Robot===
-
Put a speaker or MLS generator (espresso machine?) next to a microphone, to map a space.
+
Put a speaker or MLS (Chirp) generator (espresso machine?) next to a microphone, to map a space.
* Fast mode. Catch the first two or three echoes and find out where the two nearest surfaces are, and match those against things on the video camera in order to determine space geometry
* Fast mode. Catch the first two or three echoes and find out where the two nearest surfaces are, and match those against things on the video camera in order to determine space geometry
Line 57: Line 57:
What are the payload's weight, size, and power requirements?
What are the payload's weight, size, and power requirements?
-
* mic(s)
+
* 2 mics
-
* speaker(s)
+
* 1 speaker
* power amplifier
* power amplifier
* computer handling mics + speakers
* computer handling mics + speakers

Revision as of 22:15, 13 February 2009

Contents

Where we're at, 2009 Feb 5

We have Sarah's speaker-to-mic recordings, dimensions/positions of room, mics, speakers:

  • raw .wav files and deconvolved .mat files
  • MLS and chirp deconvolutions
  • from each of 4 speakers, to each of 40 mic positions
  • from some speaker-pairs, to each of 24 mic positions

Speaker-pair recordings are incomplete (only 4 of 6 possible pairs). But we could use them as sanity checks on the single-speaker recordings, instead of as primary data.

The plywood cube (actually particleboard with 2x4 framing) has been demolished. The thin-glass parts of the speakers have been demolished.

ISL still has the amplifiers, speaker drivers, and mics. One of the two Earthworks omnidirectional mics is malfunctioning and needs replacing, if we need stereo recording.

ISL's multichannel recording PC, fruitfly.isl.uiuc.edu, has moved south with its 8-channel i/o interface.

If we reconstruct a plywoodcube, prefer flush-with-wall conventional speakers over the original motivation of glass-speakers-through-cubewall-slits.

What we might publish (how much work still to do)

Compute room geometry and mic position from MLS

Room, loudspeaker, mic. Mic and speaker unmoving, known distance apart. Play MLS. From recorded sound, estimate room's geometry w.r.t. mic and speaker.

  • Lae-Hoon's master's thesis has an algorithm for this.
  • Verify this algorithm against plywood cube MLS recordings.
  • Generalize to non-shoebox rooms.
  • Generalize to a dynamic algorithm for a moving mic and speaker (robot).
  • Generalize to a changing room shape.

Robot

Put a speaker or MLS (Chirp) generator (espresso machine?) next to a microphone, to map a space.

  • Fast mode. Catch the first two or three echoes and find out where the two nearest surfaces are, and match those against things on the video camera in order to determine space geometry
  • Slow mode. Measure the detailed room response at a few different locations (by moving the microphone), use this information together with video (hybrid, like AVSR, increases accuracy) to build up and test hypotheses for the room geometry.

Application: work with [IFSI](invalid security certificate on 2009 Feb 13, btw) to test this in their collapsed building simulator: small robot rolls its way through the collapsed building and maps it, before the firefighters go through, to reduce the risk they are exposed to.


Prototype, for carpet or outdoor pavement, but not yet for off-road:

Duplicating this would cost us about $350 and 15 hours. It's sturdy enough to survive collisions with walls, and strong enough to carry the audio gear. Per Sarah's request, this new Version 2 now supports blinking blue LEDs.

(Since this is a speech rec group, could we justify $150 for [this]??)

What are the payload's weight, size, and power requirements?

  • 2 mics
  • 1 speaker
  • power amplifier
  • computer handling mics + speakers
  • computer running Lae-Hoon's algorithm

How much computation happens on the robot, and how much on its base-station laptop?

Corpus

Like AVICAR, but to validate room response models. No room-rebuilding, no more "research." Mention image-source, as well as several other algorithms.

Refine image-source

Add frequency dependence to wall reflection and/or air transmission, and other subtle refinements as the data suggests. Have to look at CATT and other commercial packages for architectural acoustics; they include, e.g., hybrid image source/ray-tracing room responses, with frequency response of different materials implemented at each reflection.

When we discussed this in early 2008, Mark guessed at least 12 months until "good-sounding" room inverse (40 dB, not just Bowon's 10 dB) in simulation, warranted before sawing particleboard.

Mask the reverberant tail by adding 10 dB SNR noise, since later echos may overlap too much to cancel rigorously.

Validate room response models

Play sounds convolved by the plywood cube's computed inverse-impulse-response. Compare the recorded results to the original unconvolved sounds. In simulations, or with a fresh plywoodcube.

A wood "phonebooth" would fit almost anywhere. Camille can imagine a larger phonebooth at ISL, though we'd have to sell Hank on building such a contraption, and we'd want to operate it remotely since it's not walking distance.

Two extensions of Lae-Hoon's Jan 30 paper review

1. Remove assumption of time invariance of RIR, because listeners' heads and ears move enough to degrade performance at high frequencies.

2. Extend their simulation to experiment with real microphones.

Of each mic in an array:

  • nonuniform frequency response
  • nonuniform spatial ("off-axis") response
  • nonuniform accuracy of measurement of spatial position
  • nonuniform accuracy of measurement of orientation, if mic isn't "omnidirectional"
  • nonuniform SNR
  • correlated inter-mic noise (not independent Gaussians) from multichannel preamplifier
  • actual crosstalk between channels, again from preamp
  • noises in domains other than amplitude-vs-time

At some point, even if mics cost no money, these inaccuracies suggest that adding mics would degrade rather than improve performance.

Sensitivity analysis of these things could be done entirely in simulation, as a quickly publishable result. A second paper could test that with actual experiments.

Personal tools