Software
From SpeechWiki
(Difference between revisions)
m |
|||
Line 2: | Line 2: | ||
Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software . | Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software . | ||
- | + | You can access each project by browsing an SVN snapshot online or downloading at tgz file by using one of the links below. | |
+ | You can also check it out of our [http://subversion.tigris.org subversion] server using login name "anon" with no password (hit "enter" when a password is requested). | ||
* On Windows, download [http://tortoisesvn.tigris.org/ TortoiseSVN] | * On Windows, download [http://tortoisesvn.tigris.org/ TortoiseSVN] | ||
- | * On Linux, use the svn command | + | * On Linux, use the svn command. For example if the project is available below under {{SoftwarePath}}/speechfileformats, you can also do <code>svn co svn://mickey.ifp.uiuc.edu/speechfileformats</code> |
==Learning== | ==Learning== | ||
- | ; Pronounce: An orthographic string to phonetic string mapping tool. | + | ; Pronounce (Arthur Kantor, 2007) |
+ | : [[Phonetic Transcription Tool | Description]], [http://mickey.ifp.uiuc.edu/speech/webpronounce/webpronounce.cgi Demo], [{{SoftwarePath}}/pronounce source], [{{SoftwarePath}}/pronounce.tgz tgz] | ||
+ | : An orthographic string to phonetic string mapping tool. | ||
:This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words. | :This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words. | ||
- | |||
- | |||
;HTK-based Explicit-duration HMM (Ken Chen, 2003) | ;HTK-based Explicit-duration HMM (Ken Chen, 2003) | ||
- | :[http://www.isle.uiuc.edu/pubs/2003/chen03interspeech.pdf Description], [ | + | :[http://www.isle.uiuc.edu/pubs/2003/chen03interspeech.pdf Description], [{{SoftwarePath}}/HDK4_release source], [{{SoftwarePath}}/HDK4_release.tgz tgz] |
- | + | ||
;HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008) | ;HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008) | ||
+ | :[{{SoftwarePath}}/HTKtrain source], [{{SoftwarePath}}/HTKtrain.tgz tgz] | ||
:Scripts for training HMMs using HTK | :Scripts for training HMMs using HTK | ||
- | |||
==Signal Processing== | ==Signal Processing== | ||
;PVTK (Sarah Borys and MH 2005-8) | ;PVTK (Sarah Borys and MH 2005-8) | ||
+ | :[{{SoftwarePath}}/PVTK source], [{{SoftwarePath}}/PVTK.tgz tgz] | ||
:Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files | :Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files | ||
- | |||
- | |||
;VAD (Bowon Lee, 2007) | ;VAD (Bowon Lee, 2007) | ||
+ | :[http://www.isle.uiuc.edu/pubs/2007/lee07dspincars.pdf Description], [{{SoftwarePath}}/lee_vad source], [{{SoftwarePath}}/lee_vad.tgz tgz] | ||
:Voice activity detector with improved noise model | :Voice activity detector with improved noise model | ||
- | |||
- | |||
;Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson 2009) | ;Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson 2009) |
Revision as of 17:08, 27 March 2010
Contents |
Statistical Speech Technology Group Software
Our policy: everything we write is free on the web. This wiki is intended to be definitive, because anybody in the group can edit it to add their own software. A spider-indexable backup is at http://www.isle.uiuc.edu/software .
You can access each project by browsing an SVN snapshot online or downloading at tgz file by using one of the links below.
You can also check it out of our subversion server using login name "anon" with no password (hit "enter" when a password is requested).
- On Windows, download TortoiseSVN
- On Linux, use the svn command. For example if the project is available below under http://mickey.ifp.illinois.edu/speech/software/speechfileformats, you can also do
svn co svn://mickey.ifp.uiuc.edu/speechfileformats
Learning
- Pronounce (Arthur Kantor, 2007)
- Description, Demo, source, tgz
- An orthographic string to phonetic string mapping tool.
- This tool computes American English phonetic transcriptions from plaintext. Its HMM either generates a most likely phonetic transcription, or forces alignment if a phonetic transcription is provided. So, it gives a reasonable pronunciation for both out-of-dictionary words and partially pronounced words.
- HTK-based Explicit-duration HMM (Ken Chen, 2003)
- Description, source, tgz
- HTKtrain (Sarah Borys and Mark Hasegawa-Johnson, 2008)
- source, tgz
- Scripts for training HMMs using HTK
Signal Processing
- PVTK (Sarah Borys and MH 2005-8)
- source, tgz
- Extract HTK features as training vectors for libSVM, apply trained SVMs directly to feature files
- VAD (Bowon Lee, 2007)
- Description, source, tgz
- Voice activity detector with improved noise model
- Nested STFTs (Dave Cohen, Camille Goudeseune, Mark Hasegawa-Johnson 2009)
- Efficient Simultaneous Multi-Scale Computation of FFTs
- Description, stft.c
- Improved Mistral (Qingsong Liu 2009)
- State of the Art Text-Independent Speaker Verification System,especially for NIST SRE
- Based on Mistral Open Source package
- Improved and New Features:
- add full factor analysis(eigenchannel and eigenvoice), instead of simple factor analysis(eigenchannel)
- add multi-threads for Windows as well as Linux
- support read HTK format feature/model
- add an effective Algorithm for fast implementation of FA.
- code optimization(for FA)
- fixed some bugs
- Source: /ws/ifp-32-2/hasegawa/pineking/programs/Improved_Mistral
Computation
- GMTK Parallel (Arthur Kantor, 2008)
- Split GMTK commands into batch jobs for a cluster
- Description,
- SVN repository
- HTK Parallel (Bowon Lee, 2006)
- These Perl scripts (description) Split an HTK command for parallel excution on a SGE cluster.
- Description,
- HCopy.pl,
HVite.pl, HERest.pl, HResults.pl
Data
- dtmfseg (Bowon Lee, 2006)
- Segment audio files at DTMF tones
- SVN repository
- transcription tools (Mark Hasegawa-Johnson, 2005)
- Convert transcription formats
- TGZ archive
- SVN repository
- speechfileformats (Mark Hasegawa-Johnson, 2004)
- Read and write HTK files in matlab
- TGZ archive
- SVN repository
- CTMRedit (Jul Cha and MH 1999)
- Manually and automatically segment CT and MR image stacks
- Description
- SVN repository
- improved MVA (Arthur Kantor 2008)
- Perform mean and variance normalization and ARMA filtering
- It's essentially this version but with
- better error reporting (e.g. failing to open file tells you so instead of core dumping)
- more accurate mean and variance estimation (doubles instead of floats in strategic places)
- faster computation in the case of MV (ARMA order 0)
- binary SVN repository
Miscellaneous
Other scripts written in perl, python, bash, and ruby can be found in SVN archive.
There is also auto-generated documentation for them.