Computer Resources

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
Line 16: Line 16:
** mickey0/sw_disTime-0.9.9 - merged from the original Switchboard time transcription and the Treebank-3 disfluency transcription (TextGrid included)
** mickey0/sw_disTime-0.9.9 - merged from the original Switchboard time transcription and the Treebank-3 disfluency transcription (TextGrid included)
** mickey0/sw_disTime-1.0.0 (TextGrid NOT included)
** mickey0/sw_disTime-1.0.0 (TextGrid NOT included)
 +
 +
 +
=Parallel Computing=
 +
==Sun Grid Engine on ifp-32==
 +
Bowon's brief introduction about the SGE [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm here]
 +
Detailed SGE document, including Job Dependency[http://wikis.sun.com/display/GridEngine/Submitting+Extended+Jobs+and+Advanced+Jobs#SubmittingExtendedJobsandAdvancedJobs-ExampleExtendedJobExample]
 +
 +
==MPI==
 +
[http://search.cpan.org/~ajgough/Parallel-MPI-Simple-0.03/ Perl MPI Simple]
 +
=Applications=
=Applications=

Revision as of 22:29, 20 June 2008

Contents

LVCSR at Illinois Computer Resources

  • Data:
    • Corpora we develop and distribute
    • We are members of LDC. Most LDC data is organized as described in the Data Organization README. Some useful slices of LDC data that have not been moved to ifp-32-2 include:
      • /workspace/fluffy1/12hour - 12 hours extracted from Switchboard 1, with SPHERE and WAV audio, MFCCs, transcriptions.
      • /workspace/fluffy1/{train-ws96,train-ws97,misc-ws97} - The ICSI phonetically transcribed Switchboard-1 extracts
      • /workspace/fletcher1/bdc - The Boston Directions Corpus, two speakers have prosodic transcriptions, others don't
      • /workspace/nibbler0/data/ylzheng/WS04/DATA - Tsinghua Wu-accented Mandarin (MFCC and FMT only, no waveforms)
  • Time-aligned Switchboard Disfluency corpus
    • mickey0/sw_disTime-0.9.9 - merged from the original Switchboard time transcription and the Treebank-3 disfluency transcription (TextGrid included)
    • mickey0/sw_disTime-1.0.0 (TextGrid NOT included)


Parallel Computing

Sun Grid Engine on ifp-32

Bowon's brief introduction about the SGE here Detailed SGE document, including Job Dependency[1]

MPI

Perl MPI Simple


Applications

  • Acoustic model training:
    • HTK hidden Markov modeling toolkit: ifp-32-1/hasegawa/programs/htk-3.4
    • GMTK Dynamic Bayesian Nets/Graphical Models: nibbler0/speech_apps/GMTK
    • Sphinx speech recognizer
    • LIUM speech tools, including speaker segmentation
  • Decoding:
    • Julius LVCSR decoder
    • AT&T DCD LVCSR decoder - nibbler0/speech_apps/dcd-2.0
  • Language model training:
    • SRILM Big N-gram counts and backoff, lattices: fluffy0/programs/srilm
    • AT&T FSM Library: fluffy0/programs/fsm-4.0
    • OpenFST: fluffy0/programs/OpenFst/
  • Spectrograms and Waveform Viewing
    • XKL (MIT): nibbler0/speech_apps/xkl-2.3.1
    • ESPS (Entropic Systems, now Microsoft)
    • Praat

Installing / Arranging Software

If you download linux software from the internet, and find it useful, please put it where others may also use it! Here's how.

  1. Type `umask 022` or `umask 000`. If you use 022, you are volunteering to manage the package; if you use 000, you are inviting others to help manage it.
  2. Download the tarfile to /workspace/ifp-32-1/hasegawa/programs; untar it to create $PACKAGE_DIR; remove the tar file (important!); configure; make all.
  3. Decide where you want the binaries. Reasonable places for programs are /workspace/ifp-32-1/hasegawa/programs/...
    • scripts = executes on any machine (e.g., perl, bash scripts)
    • bin.`uname` (i.e., bin.Linux) = executes on both ifp-32 and mickey. PLEASE CHECK: ssh mickey; execute code; see if it gives you "cannot execute binary file".
    • bin.`arch` = executes only on machines of type `arch`. Type `arch` to see what machine you're on.
    • $PACKAGE_DIR/bin.Linux = packages with many binaries should remain in $PACKAGE_DIR, to avoid over-writing similarly-named programs in ../bin.Linux.
  4. Change the installdir variable in your Makefile, according to your decision in part (3). Type "make install" to install, then "make clean" to remove object files and such.

Backups

If you have personal working directories that should be regularly backed up, outside of your own home directory, list them here.

  • Art
    • mickey0/akantor
    • rizzo1/akantor is itself a backup of svn because it cannot be backed up in the normal way.
  • Sarah
    • nibbler0/data
    • rizzo0/sborys
    • spot1/sborys
    • tico0/sborys
  • Xiaodan
    • spot1/xzhuang2/newbaseline
    • spot1/xzhuang2/workshop
    • c1-15/hasegawa/xzhuang2*
    • /workspace/tico0/AED/

SVN

On linux, the client is svn, and should be installed everywhere. On windows, download toroisesvn (http://tortoisesvn.tigris.org/) http://artis.imag.fr/~Xavier.Decoret/resources/svn/index.html

Personal tools