Software

From SpeechWiki

Jump to: navigation, search

Learning

HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008): source, tgz; Scripts for training HMMs using HTK

Pronounce (Arthur Kantor, 2007): Description, Demo, source, tgz; An orthographic string to phonetic string mapping tool.; This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words.

HTK-based Explicit-duration HMM (Ken Chen, 2003): Description, source, tgz

Signal Processing

Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis (Po-Sen Huang, 2012): Projects Description and Demo MATLAB code

Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson, 2009): Efficient Simultaneous Multi-Scale Computation of FFTs; Description, stft.c

Improved Mistral (Qingsong Liu, 2009)

State of the Art Text-Independent Speaker Verification System,especially for NIST SRE

Based on Mistral Open Source package

Improved and New Features:

full factor analysis(eigenchannel and eigenvoice), instead of simple factor analysis(eigenchannel)
multi-threads for Windows as well as Linux
read HTK format feature/model
effective Algorithm for fast implementation of FA.
code optimization(for FA)
fixed some bugs

Source: /ws/ifp-32-2/hasegawa/pineking/programs/Improved_Mistral

PVTK (Sarah Borys and Mark Hasegawa-Johnson, 2005-8): source, tgz; Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files

VAD (Bowon Lee, 2007): Description, source, tgz; Voice activity detector with improved noise model

Computation

Matlab GMM (Arthur Kantor, 2010): source, tgz; A somewhat optimized Matlab toolbox for calculating the likelihood of many observations against many gaussian mixtures, each with many diagonal-covariance componenents.

GMTK Parallel (Arthur Kantor, 2008): Description; Run GMTK commands in parallel on a compute cluster. Email Arthur for code.

HTK Parallel (Bowon Lee, 2006): description, source, tgz; These Perl scripts split an HTK command for parallel excution on an SGE cluster.

Data

Timeliner (Camille Goudeseune, 2012): Description, Linux source tgz; Browser for long audio files, with generated spectrograms and other derived features.

Matlab pfile I/O toolbox (Arthur Kantor, 2010): source, tgz; A Matlab toolbox for reading and writing ICSI pfile data format used by GMTK and QuickNet.; The toolbox is designed to work with large pfiles (hundreds of GB).; It is based on the pfread.m and pfinfo.m scripts by Dan Ellis.

Python library for reading/writing GMTK parameter files (Arthur Kantor, 2010): source, tgz; The library can read/write complete TrainableParameters files, as well as decision trees and most other objects allowed in GMTK parameter files.; This library is based on the code from EHVS parser project, and so is available under the GPL3 license.

Improved MVA (Arthur Kantor, 2008)

Linux binary

Perform mean and variance normalization and ARMA filtering

It's essentially this version but with

better error reporting (e.g. failing to open file tells you so instead of core dumping)
more accurate mean and variance estimation (doubles instead of floats in strategic places)
faster computation in the case of MV (ARMA order 0)

svn location is svn://mickey.ifp.uiuc.edu/corporaNormalizationScripts/fisher/MVA.cc

DTMFseg (Bowon Lee, 2006): source, tgz; Segment audio files at DTMF tones

Transcription tools (Mark Hasegawa-Johnson, 2005): source, tgz; Convert transcription formats

Speechfileformats (Mark Hasegawa-Johnson, 2004): source, tgz; Read and write HTK files in matlab

CTMRedit (Jul Cha and Mark Hasegawa-Johnson, 1999): Description, source, tgz; Manually and automatically segment CT and MR image stacks

LaTeX tools

LaTeX scripts to import figures from dia and pdf. (Arthur Kantor, 2010)

Miscellaneous

Other scripts written in perl, python, bash, and ruby can be found in SVN archive.

There is also auto-generated documentation for them.

Software

From SpeechWiki

Contents

Statistical Speech Technology Group Software

Learning

Signal Processing

Computation

Data

LaTeX tools

Miscellaneous

Views

Personal tools

Navigation

Toolbox

Search