Computer Resources

From SpeechWiki

Revision as of 21:38, 4 February 2009 by Xzhuang2 (Talk | contribs)
Jump to: navigation, search

Contents

LVCSR at Illinois Computer Resources

  • Data:
    • Corpora we develop and distribute
    • We are members of LDC. Most LDC data is organized as described in the Data Organization README. Some useful slices of LDC data that have not been moved to ifp-32-2 include:
      • /workspace/fluffy1/12hour - 12 hours extracted from Switchboard 1, with SPHERE and WAV audio, MFCCs, transcriptions.
      • /workspace/fluffy1/{train-ws96,train-ws97,misc-ws97} - The ICSI phonetically transcribed Switchboard-1 extracts
      • /workspace/fletcher1/bdc - The Boston Directions Corpus, two speakers have prosodic transcriptions, others don't
      • /workspace/nibbler0/data/ylzheng/WS04/DATA - Tsinghua Wu-accented Mandarin (MFCC and FMT only, no waveforms)
      • /workspace/fluffy1/penn_treebank
  • Time-aligned Switchboard Disfluency corpus
    • mickey0/sw_disTime-0.9.9 - merged from the original Switchboard time transcription and the Treebank-3 disfluency transcription (TextGrid included)
    • mickey0/sw_disTime-1.0.0 (TextGrid NOT included)

Parallel Computing

Sun Grid Engine on ifp-32

Bowon's brief introduction about the SGE here Detailed SGE document, including Job Dependency[1]

MPI

Perl MPI Simple


Applications

  • Acoustic model training:
    • HTK hidden Markov modeling toolkit: ifp-32-1/hasegawa/programs/htk-3.4
    • GMTK Dynamic Bayesian Nets/Graphical Models: nibbler0/speech_apps/GMTK
    • Sphinx speech recognizer
    • LIUM speech tools, including speaker segmentation
  • Decoding:
    • Julius LVCSR decoder
    • AT&T DCD LVCSR decoder - nibbler0/speech_apps/dcd-2.0
  • Language model training:
    • SRILM Big N-gram counts and backoff, lattices: fluffy0/programs/srilm
    • AT&T FSM Library: fluffy0/programs/fsm-4.0
    • OpenFST: fluffy0/programs/OpenFst/
  • Spectrograms and Waveform Viewing
    • XKL (MIT): nibbler0/speech_apps/xkl-2.3.1
    • ESPS (Entropic Systems, now Microsoft)
    • Praat

Installing / Arranging Software

If you download linux software from the internet, and find it useful, please put it where others may also use it! Here's how.

  1. Type `umask 022` or `umask 000`. If you use 022, you are volunteering to manage the package; if you use 000, you are inviting others to help manage it.
  2. Download the tarfile to /workspace/ifp-32-1/hasegawa/programs; untar it to create $PACKAGE_DIR; remove the tar file (important!); configure; make all.
  3. Decide where you want the binaries. Reasonable places for programs are /workspace/ifp-32-1/hasegawa/programs/...
    • scripts = executes on any machine (e.g., perl, bash scripts)
    • bin.`uname` (i.e., bin.Linux) = executes on both ifp-32 and mickey. PLEASE CHECK: ssh mickey; execute code; see if it gives you "cannot execute binary file".
    • bin.`arch` = executes only on machines of type `arch`. Type `arch` to see what machine you're on.
    • $PACKAGE_DIR/bin.Linux = packages with many binaries should remain in $PACKAGE_DIR, to avoid over-writing similarly-named programs in ../bin.Linux.
  4. Change the installdir variable in your Makefile, according to your decision in part (3). Type "make install" to install, then "make clean" to remove object files and such.

Backups

If you have personal working directories that should be regularly backed up, outside of your own home directory, list them here.

  • Art
    • mickey0/akantor
    • rizzo1/akantor is itself a backup of svn because it cannot be backed up in the normal way.
  • Sarah
    • nibbler0/data
    • rizzo0/sborys
    • spot1/sborys
    • tico0/sborys
  • Xiaodan
    • spot1/xzhuang2/newbaseline
    • spot1/xzhuang2/workshop
    • c1-15/hasegawa/xzhuang2*
    • /workspace/tico0/AED/

SVN

Our server is svn://mickey.ifp.uiuc.edu

On windows, download tortoisesvn.

On linux, the client is svn, and should be installed everywhere.

For linux command help see simple tutorial (don't worry about any of the svnadmin commands, and replace file:///home/user/svn with svn://mickey.ifp.uiuc.edu

Compiling

gcc is used by default, but I (Arthur) am getting good results with intel's compiler which is available for free for non-comercial use and is installed in /workspace/ifp-32-1/hasegawa/programs/intel (we got the fortran, c/c++ compilers, and the intel math library).

Benchmarking quicknet in 4 thread mode with every combination of intel/gcc and ATLAS/intel implementations of the BLAS library, you get the following:

logs/smallGccCompilerIntelMath.log:     CV speed: 4351.14 MCPS, 3107.8 presentations/sec.
logs/smallGccCompilerIntelMath.log:     Train speed: 2056.95 MCUPS, 1469.2 presentations/sec.
logs/smallGccCompilerIntelMath.log:     CV speed: 4691.55 MCPS, 3351.0 presentations/sec.

logs/smallIntelCompilerIntelMathLib.log:CV speed: 3984.39 MCPS, 2845.9 presentations/sec.
logs/smallIntelCompilerIntelMathLib.log:Train speed: 2140.31 MCUPS, 1528.7 presentations/sec.
logs/smallIntelCompilerIntelMathLib.log:CV speed: 4034.74 MCPS, 2881.9 presentations/sec.

logs/smallIntelCompilerATLASMathLib.log:CV speed: 3508.69 MCPS, 2506.1 presentations/sec.
logs/smallIntelCompilerATLASMathLib.log:Train speed: 1961.05 MCUPS, 1400.7 presentations/sec.
logs/smallIntelCompilerATLASMathLib.log:CV speed: 3553.22 MCPS, 2537.9 presentations/sec.

logs/smallGccCompilerATLASMathLib.log:  CV speed: 4219.30 MCPS, 3013.7 presentations/sec.
logs/smallGccCompilerATLASMathLib.log:  Train speed: 1954.73 MCUPS, 1396.2 presentations/sec.
logs/smallGccCompilerATLASMathLib.log:  CV speed: 4133.10 MCPS, 2952.1 presentations/sec.

The train speed is the interesting one because it takes the longest, and on it we get almost a 10% speed up. Strangely CV (testing) speed is best with a gcc compiler and Intel math library.

Using the intel compiler and math library from the above setup and running on the shiny new PCs that Mark got for us:

CV speed:    3828.51 MCPS, 2734.6 presentations/sec.
Train speed: 2932.56 MCUPS, 2094.6 presentations/sec.
CV speed:    4093.49 MCPS, 2923.8 presentations/sec.

Going from gcc to intel you have to switch tools as follows :

gcc intel
---
gcc icc
g++ icpc
ar  xiar
Personal tools