Fisher Front End

From SpeechWiki

There are two sets of PLP feature vectors created for the entire corpus.

PLPs for MLP classifiers

PLPs created in exactly the same way as the training data for MLPs described in <ref name="frankel2007articulatory">J. Frankel et al., “Articulatory feature classifiers trained on 2000 hours. of telephone speech,” ICASSP, 2007</ref> The hcopy config file to generate PLP features for MLP input is here. This way, we can use the MLPs presented in the above paper for segmenting the speech for timeshrinking experiments.

mean and variance normalized, ARMAed PLPs for gaussian mixtures

The second set of features is used to construct the mixture gaussian models. The features are PLPs, deltas and accelerations generated with this hcopy config. The following aspects are slightly non-standard:

The mel-frequency filter bank is constructed only over the band of 125hz-3800khz, and not over the entire telephone speech range of 0-4000hz. There is some slight benefit to this found in <ref name="MVA">MVA: a noise-robust feature processing scheme</ref>, although in <ref name="Hain1998Htk"/> band-limiting has an ambiguous affect on accuracy.
The 0th cepstral coefficient is used, instead of the log-energy again due to experiments in <ref name="MVA"/>.

At this point, only the frames which correspond to transcribed audio are extracted, and the following steps are performed only on frames from time periods of transcribed audio. The features are still stored in one file per conversation side.

Normalization

The cepstral coefficients, the deltas and accelerations are each normalized to 0-mean, unit-variance. as in <ref name="MVA"/>. This is different from the HTK book, which normalizes only the coefficients, and takes the deltas and accelerations afterwards (deltas and accelerations are not re-normalized). Normalization is done per conversation side as recommended in <ref name="Hain1998Htk">Hain 1998, The 1998 HTK System For Transcription of Conversational Telephone Speech</ref>.

Finally a order-2 ARMA filter is used. The whole thing is made easy by this MVA program written by Chia-ping Chen.

References

Fisher Front End

From SpeechWiki

Contents

PLPs for MLP classifiers

mean and variance normalized, ARMAed PLPs for gaussian mixtures

Normalization

See Also

References

Views

Personal tools

Navigation

Toolbox

Search