Software

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
m (nested stft, added Mark's name)
m (fix broken link to stft.c by Cohen,Goudeseune,H-J)
 
(34 intermediate revisions not shown)
Line 1: Line 1:
-
=Statistical Speech Technology Group Software=
+
===Statistical Speech Technology Group Software===
-
 
+
Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software .
Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software .
-
Our software is available via [http://subversion.tigris.org subversion], using login name "anon" with no password (hit "enter" when a password is requested).
+
You can access each project by browsing an SVN snapshot online, or by downloading a tgz file (gzipped .tar file) by using one of the links below.
-
 
+
-
* On Windows, download [http://tortoisesvn.tigris.org/ TortoiseSVN]
+
-
* On Linux, use the svn command, e.g., svn co svn://mickey.ifp.uiuc.edu/speechfileformats
+
-
 
+
-
<table border=2><tr>
+
-
<tr><td>Learning</td></tr>
+
-
<tr><td>Pronounce</td><td>Letters to phones using an HMM<br>
+
-
[http://mickey.ifp.uiuc.edu/speechWiki/index.php/Phonetic_Transcription_Tool Description],[http://mickey.ifp.uiuc.edu/speech/webpronounce/webpronounce.cgi Demo],
+
-
[svn://mickey.ifp.uiuc.edu/pronounce SVN archive] (Arthur Kantor, 2007)</td></tr>
+
-
<tr><td>HDK</td><td>HTK-based Explicit-duration HMM<br>
+
You can also check it out of our [http://subversion.apache.org subversion] server using login name "anon" with no password (hit "enter" when a password is requested).
-
[http://www.isle.uiuc.edu/pubs/2003/chen03interspeech.pdf Description],
+
* On Windows, use [http://tortoisesvn.net/ TortoiseSVN].
-
[http://www.isle.uiuc.edu/software/HDK4.tar.gz TGZ archive], [svn://mickey.ifp.uiuc.edu/HDK4_release SVN repository] (Ken Chen, 2003)
+
* On Linux, use <code>svn</code>. For example, for a project listed at {{SoftwarePath}}/speechfileformats use the command
-
</td></tr>
+
svn co svn://mickey.ifp.illinois.edu/speechfileformats
-
<tr><td>HTKtrain</td><td>Scripts for training HMMs using HTK<br>
+
==Learning==
-
[svn://mickey.ifp.uiuc.edu/HTKtrain SVN repository] (Sarah Borys and Mark Hasegawa-Johnson, 2008)
+
;HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008)
-
</td></tr>
+
:[{{SoftwarePath}}/HTKtrain source], [{{SoftwarePath}}/HTKtrain.tgz tgz]
 +
:Scripts for training HMMs using HTK
-
<tr><td>Signal Processing</td></tr>
+
; Pronounce (Arthur Kantor, 2007)
-
<tr><td>PVTK</td><td>Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files<br>
+
: [[Phonetic Transcription Tool | Description]], [http://mickey.ifp.uiuc.edu/speech/webpronounce/webpronounce.cgi Demo], [{{SoftwarePath}}/pronounce source], [{{SoftwarePath}}/pronounce.tgz tgz]
-
[http://www.isle.uiuc.edu/software/PVTK2005May23.tgz TGZ archive], [svn://mickey.ifp.uiuc.edu/PVTK SVN repository] (Sarah Borys and MH 2005-8)
+
: An orthographic string to phonetic string mapping tool.
-
</td></tr>
+
:This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words.
-
<tr><td>VAD</td><td>Voice activity detector with improved noise model<br>
+
-
[http://www.isle.uiuc.edu/pubs/2007/lee07dspincars.pdf Description],  
+
-
[http://www.isle.uiuc.edu/software/lee_vad.m lee_vad.m], [svn://mickey.ifp.uiuc.edu/lee_vad SVN repository] (Bowon Lee, 2007)
+
-
</td></tr>
+
-
<tr><td>Nested STFTs</td><td>Efficient Simultaneous Multi-Scale Computation of FFTs<br>
+
-
[http://fodava.gatech.edu/files/reports/GT-FODAVA-09-01.pdf Description], [http://zx81.isl.uiuc.edu/tmp/stft.c stft.c] (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson 2009)</td></tr>
+
-
<tr><td>Computation</td></tr>
+
;HTK-based Explicit-duration HMM (Ken Chen, 2003)
-
<tr><td>GMTK Parallel</td>
+
:[http://www.isle.uiuc.edu/pubs/2003/chen03interspeech.pdf Description], [{{SoftwarePath}}/HDK4_release source], [{{SoftwarePath}}/HDK4_release.tgz tgz]
-
<td>Split GMTK commands into batch jobs for a cluster<br>
+
-
[http://mickey.ifp.uiuc.edu/speechWiki/index.php/GMTK_parallel_tools Description],  
+
-
[svn://mickey.ifp.uiuc.edu/gmtkScripts/ SVN repository] (Arthur Kantor, 2008)</td></tr>
+
-
<tr><td>HTK Parallel
+
==Signal Processing==
-
</td><td>
+
;Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis (Po-Sen Huang, 2012)
-
Split an HTK command into batch jobs for a cluster (Bowon Lee, 2006)<br>
+
: [https://sites.google.com/site/singingvoiceseparationrpca/ Projects Description and Demo] [http://www.ifp.illinois.edu/~huang146/RPCA_separation/RPCA_separation_codes.zip MATLAB code]
-
[http://www.ifp.uiuc.edu/~bowonlee/research/cluster/HTK_parallel.htm Description],
+
-
[http://www.ifp.uiuc.edu/~bowonlee/research/htk-pl/HCopy.pl HCopy.pl],
+
-
[http://www.ifp.uiuc.edu/~bowonlee/research/htk-pl/HVite.pl HVite.pl],
+
-
[http://www.ifp.uiuc.edu/~bowonlee/research/htk-pl/HERest.pl HERest.pl],
+
-
[http://www.ifp.uiuc.edu/~bowonlee/research/htk-pl/HResults.pl HResults.pl], [svn://mickey.ifp.uiuc.edu/HTK_parallel/ SVN repository] </td></tr>
+
-
<tr><td>Data</td></tr>
+
;Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson, 2009)
-
<tr><td>dtmfseg</td><td>Segment audio files at DTMF tones<br>
+
:Efficient Simultaneous Multi-Scale Computation of FFTs
-
[svn://mickey.ifp.uiuc.edu/dtmfseg/ SVN repository] (Bowon Lee, 2006)</td></tr>
+
: [http://fodava.gatech.edu/files/reports/GT-FODAVA-09-01.pdf Description], [http://zx81.isl.uiuc.edu/camilleg/gt-fodava-09-01-stft.c stft.c]  
-
<tr><td>transcription tools</td><td>Convert transcription formats<br>
 
-
[http://www.isle.uiuc.edu/software/transcription_tools2005May.tgz TGZ archive], [svn://mickey.ifp.uiuc.edu/transcription_tools/ SVN repository] (Mark Hasegawa-Johnson, 2005)</td></tr>
 
-
<tr><td>speechfileformats</td><td>Read and write HTK files in matlab<br>
+
;Improved Mistral (Qingsong Liu, 2009)
-
[http://www.isle.uiuc.edu/software/speechfileformats.tgz TGZ archive], [svn://mickey.ifp.uiuc.edu/speechfileformats/ SVN repository] (Mark Hasegawa-Johnson, 2004)</td></tr>
+
:State of the Art Text-Independent Speaker Verification System,especially for NIST SRE
 +
:Based on [http://mistral.univ-avignon.fr/wiki/index.php/Main_Page Mistral Open Source package]
 +
:Improved and New Features:
 +
:* full factor analysis(eigenchannel and eigenvoice), instead of simple factor analysis(eigenchannel)
 +
:* multi-threads for Windows as well as Linux
 +
:* read HTK format feature/model
 +
:* effective Algorithm for fast implementation of FA.
 +
:* code optimization(for FA)
 +
:* fixed some bugs
 +
:Source: /ws/ifp-32-2/hasegawa/pineking/programs/Improved_Mistral
-
<tr><td>CTMRedit</td><td>Manually and automatically segment CT and MR image stacks<br>
+
;PVTK (Sarah Borys and Mark Hasegawa-Johnson, 2005-8)
-
[http://www.isle.uiuc.edu/pubs/1990s/hasegawa-johnson99embs.pdf Description],  
+
:[{{SoftwarePath}}/PVTK source], [{{SoftwarePath}}/PVTK.tgz tgz]
-
[svn://mickey.ifp.uiuc.edu/CTMRedit SVN repository] (Jul Cha and MH 1999)
+
:Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files
-
</td></tr>
+
-
<tr><td>improved MVA</td><td>Perform mean and variance normalization and ARMA filtering<br>
+
;VAD (Bowon Lee, 2007)
-
It's essentially [http://ssli.ee.washington.edu/people/chiaping/mva.html this] version,  
+
:[http://www.isle.uiuc.edu/pubs/2007/lee07dspincars.pdf Description], [{{SoftwarePath}}/lee_vad source], [{{SoftwarePath}}/lee_vad.tgz tgz]
-
improved:
+
:Voice activity detector with improved noise model
-
* better error reporting (e.g. failing to open file tells you so instead of core dumping)
+
-
* more accurate mean and variance estimation (doubles instead of floats in strategic places)
+
-
* faster computation in the case of MV (ARMA order 0)
+
-
source: svn://mickey.ifp.uiuc.edu/corporaNormalizationScripts/fisher/MVA.cc
+
==Computation==
 +
;Matlab GMM (Arthur Kantor, 2010)
 +
: [{{SoftwarePath}}/gmm source], [{{SoftwarePath}}/gmm.tgz tgz]
 +
: A somewhat optimized Matlab toolbox for calculating the likelihood of many observations against many gaussian mixtures, each with many diagonal-covariance componenents.
-
binary: http://mickey.ifp.uiuc.edu/speech/akantor/fisher/programs/bin.Linux/MVA
+
;GMTK Parallel (Arthur Kantor, 2008)
-
</td></tr>
+
:[[GMTK parallel tools|Description]]
 +
:Run GMTK commands in parallel on a compute cluster. Email [[User:Arthur|Arthur]] for code.
-
<tr><td>Scripts</td><td>miscellaneous perl, python, bash, and ruby [svn://mickey.ifp.uiuc.edu/scripts SVN archive],
+
;HTK Parallel (Bowon Lee, 2006)
-
[[Scripts_Documentation| Documentation]]</td></tr>
+
:[http://www.ifp.uiuc.edu/~bowonlee/research/cluster/HTK_parallel.htm description], [{{SoftwarePath}}/HTK_parallel source], [{{SoftwarePath}}/HTK_parallel.tgz tgz]
-
</table>
+
:These Perl scripts split an HTK command for parallel excution on an [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm SGE] cluster.
-
=Phonetic Transcription Tool=
+
==Data==
 +
;Timeliner (Camille Goudeseune, 2012)
 +
:[http://zx81.isl.uiuc.edu/tmp/acmmm.pdf Description], [{{SoftwarePath}}/timeliner-2012-07-24.tgz Linux source tgz]
 +
:Browser for long audio files, with generated spectrograms and other derived features.
-
This tool computes American English phonetic transcriptions from plaintext.
+
;Matlab pfile I/O toolbox (Arthur Kantor, 2010)
-
Its HMM either generates a most likely phonetic transcription,
+
: [{{SoftwarePath}}/pfile source], [{{SoftwarePath}}/pfile.tgz tgz]
-
or forces alignment if a phonetic transcription is provided.
+
: A Matlab toolbox for reading and writing ICSI pfile data format used by [http://ssli.ee.washington.edu/~bilmes/gmtk/ GMTK] and [http://www.icsi.berkeley.edu/Speech/qn.html QuickNet].
 +
: The toolbox is designed to work with large pfiles (hundreds of GB).
 +
: It is based on the [http://www.ee.columbia.edu/~dpwe/muscontent/matlab/pfread.m pfread.m] and [http://www.ee.columbia.edu/~dpwe/muscontent/matlab/pfinfo.m pfinfo.m] scripts by [http://www.ee.columbia.edu/~dpwe Dan Ellis].
-
So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words.
+
;Python library for reading/writing GMTK parameter files (Arthur Kantor, 2010)
 +
: [{{SoftwarePath}}/gmtkParam source], [{{SoftwarePath}}/gmtkParam.tgz tgz]
 +
: The library can read/write complete TrainableParameters files, as well as decision trees and most other objects allowed in GMTK parameter files.
 +
: This library is based on the code from [http://code.google.com/p/extended-hidden-vector-state-parser/ EHVS parser] project, and so is available under the [http://www.gnu.org/licenses/gpl-3.0.html GPL3] license.
-
[[Phonetic Transcription Tool]].
+
;Improved MVA (Arthur Kantor, 2008)
 +
:[http://mickey.ifp.uiuc.edu/speech/akantor/fisher/programs/bin.Linux/MVA Linux binary]
 +
:Perform mean and variance normalization and ARMA filtering
 +
:It's essentially [http://ssli.ee.washington.edu/people/chiaping/mva.html this] version but with
 +
:* better error reporting (e.g. failing to open file tells you so instead of core dumping)
 +
:* more accurate mean and variance estimation (doubles instead of floats in strategic places)
 +
:* faster computation in the case of MV (ARMA order 0)
 +
:svn location is svn://mickey.ifp.uiuc.edu/corporaNormalizationScripts/fisher/MVA.cc
-
[http://mickey.ifp.uiuc.edu/speech/webpronounce/webpronounce.cgi Online demo].
+
;DTMFseg (Bowon Lee, 2006)
 +
:[{{SoftwarePath}}/dtmfseg source], [{{SoftwarePath}}/dtmfseg.tgz tgz]
 +
:Segment audio files at DTMF tones
-
=Scripts for parallel processing of HTK commands=
+
;Transcription tools (Mark Hasegawa-Johnson, 2005)
 +
:[{{SoftwarePath}}/transcription_tools source], [{{SoftwarePath}}/transcription_tools.tgz tgz]
 +
:Convert transcription formats
-
These Perl scripts ([http://www.ifp.uiuc.edu/~bowonlee/research/cluster/HTK_parallel.htm description])
+
;Speechfileformats (Mark Hasegawa-Johnson, 2004)
-
queue jobs with [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm SGE], Sun Grid Engine.
+
:[{{SoftwarePath}}/speechfileformats source], [{{SoftwarePath}}/speechfileformats.tgz tgz]
 +
:Read and write HTK files in matlab
-
[http://www.ifp.uiuc.edu/~bowonlee/htk-pl/HCopy.pl HCopy.pl]
+
;CTMRedit (Jul Cha and Mark Hasegawa-Johnson, 1999)
 +
:[http://www.isle.uiuc.edu/pubs/1990s/hasegawa-johnson99embs.pdf Description], [{{SoftwarePath}}/CTMRedit source], [{{SoftwarePath}}/CTMRedit.tgz tgz]
 +
:Manually and automatically segment CT and MR image stacks
-
[http://www.ifp.uiuc.edu/~bowonlee/htk-pl/HERest.pl HERest.pl]
+
==LaTeX tools==
-
[http://www.ifp.uiuc.edu/~bowonlee/htk-pl/HVite.pl HVite.pl]
+
[[LaTeX]] scripts to import figures from dia and pdf. (Arthur Kantor, 2010)
-
[http://www.ifp.uiuc.edu/~bowonlee/htk-pl/HResults.pl HResults.pl]
+
==Miscellaneous==
 +
Other scripts written in perl, python, bash, and ruby can be found in [svn://mickey.ifp.uiuc.edu/scripts SVN archive].
-
Bowon Lee, 02/24/2006
+
There is also [[Scripts Documentation| auto-generated documentation]] for them.

Latest revision as of 17:19, 4 February 2015

Contents

Statistical Speech Technology Group Software

Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software .

You can access each project by browsing an SVN snapshot online, or by downloading a tgz file (gzipped .tar file) by using one of the links below.

You can also check it out of our subversion server using login name "anon" with no password (hit "enter" when a password is requested).

svn co svn://mickey.ifp.illinois.edu/speechfileformats

Learning

HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008)
source, tgz
Scripts for training HMMs using HTK
Pronounce (Arthur Kantor, 2007)
Description, Demo, source, tgz
An orthographic string to phonetic string mapping tool.
This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words.
HTK-based Explicit-duration HMM (Ken Chen, 2003)
Description, source, tgz

Signal Processing

Singing-Voice Separation From Monaural Recordings Using Robust Principal Component Analysis (Po-Sen Huang, 2012)
Projects Description and Demo MATLAB code
Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson, 2009)
Efficient Simultaneous Multi-Scale Computation of FFTs
Description, stft.c


Improved Mistral (Qingsong Liu, 2009)
State of the Art Text-Independent Speaker Verification System,especially for NIST SRE
Based on Mistral Open Source package
Improved and New Features:
  • full factor analysis(eigenchannel and eigenvoice), instead of simple factor analysis(eigenchannel)
  • multi-threads for Windows as well as Linux
  • read HTK format feature/model
  • effective Algorithm for fast implementation of FA.
  • code optimization(for FA)
  • fixed some bugs
Source: /ws/ifp-32-2/hasegawa/pineking/programs/Improved_Mistral
PVTK (Sarah Borys and Mark Hasegawa-Johnson, 2005-8)
source, tgz
Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files
VAD (Bowon Lee, 2007)
Description, source, tgz
Voice activity detector with improved noise model

Computation

Matlab GMM (Arthur Kantor, 2010)
source, tgz
A somewhat optimized Matlab toolbox for calculating the likelihood of many observations against many gaussian mixtures, each with many diagonal-covariance componenents.
GMTK Parallel (Arthur Kantor, 2008)
Description
Run GMTK commands in parallel on a compute cluster. Email Arthur for code.
HTK Parallel (Bowon Lee, 2006)
description, source, tgz
These Perl scripts split an HTK command for parallel excution on an SGE cluster.

Data

Timeliner (Camille Goudeseune, 2012)
Description, Linux source tgz
Browser for long audio files, with generated spectrograms and other derived features.
Matlab pfile I/O toolbox (Arthur Kantor, 2010)
source, tgz
A Matlab toolbox for reading and writing ICSI pfile data format used by GMTK and QuickNet.
The toolbox is designed to work with large pfiles (hundreds of GB).
It is based on the pfread.m and pfinfo.m scripts by Dan Ellis.
Python library for reading/writing GMTK parameter files (Arthur Kantor, 2010)
source, tgz
The library can read/write complete TrainableParameters files, as well as decision trees and most other objects allowed in GMTK parameter files.
This library is based on the code from EHVS parser project, and so is available under the GPL3 license.
Improved MVA (Arthur Kantor, 2008)
Linux binary
Perform mean and variance normalization and ARMA filtering
It's essentially this version but with
  • better error reporting (e.g. failing to open file tells you so instead of core dumping)
  • more accurate mean and variance estimation (doubles instead of floats in strategic places)
  • faster computation in the case of MV (ARMA order 0)
svn location is svn://mickey.ifp.uiuc.edu/corporaNormalizationScripts/fisher/MVA.cc
DTMFseg (Bowon Lee, 2006)
source, tgz
Segment audio files at DTMF tones
Transcription tools (Mark Hasegawa-Johnson, 2005)
source, tgz
Convert transcription formats
Speechfileformats (Mark Hasegawa-Johnson, 2004)
source, tgz
Read and write HTK files in matlab
CTMRedit (Jul Cha and Mark Hasegawa-Johnson, 1999)
Description, source, tgz
Manually and automatically segment CT and MR image stacks

LaTeX tools

LaTeX scripts to import figures from dia and pdf. (Arthur Kantor, 2010)

Miscellaneous

Other scripts written in perl, python, bash, and ruby can be found in SVN archive.

There is also auto-generated documentation for them.

Personal tools