GMTK parallel tools

From SpeechWiki

Jump to: navigation, search

Contents

Scripts and Modules

The *Parallel.pl scripts parse the command line args and call the routines in the corresponding *Parallel.pm modules to actually do the work.

  • viterbiParallel.pl and viterbiParallel.pm Does a viterbi decoding in parallel and then optionally runs sclite to report recognition accuracy
  • emtrainParallel.pl and emtrainParallel.pm Does a single iteration of EM training in parallel
  • emConvergeParallel.pl and emConvergeParallel.pm Does a sequence of em training iterations to convergence followed by splits/vanishes of gaussians followed by more iterations to convergence according to some convergence and split/vanish schedule
  • distribute.pl and distribute.pm Runs a list of commands in parallel on a SGE cluster.
  • gmtkUtilParallel.pm
  • grid.pm

Features

  • All of the code is packaged in modules: The tools can be used by calling a perl function instead of starting another script in a new process.
  • All tools are restartable. If they are interrupted for any reason (i.e. cluster glitch, or user ctrl-c's the job), rerunning the command will only do the minimum work required to complete the job. Successfully completed sub-tasks are not rerun.
  • A fast sanity check is performed before any parallel jobs are fired off. This way, the user gets fast feedback on simple mistakes.
  • killing the the main script (with ctrl-c, for example) stops all execution all the compute nodes

Installation and Environment

The easiest way is to have the following in your path:

You must also have the *.pm modules in your path. You can do that by setting the PERL5LIB environment variable.

Documentation

Not much yet. Some very rough overview slides are here. However the *.pm modules are relatively documented - hopefully enough to be useful.

Additional Resources

Bowon's parallel HTK tools

Bowon's SGE basics

--Arthur 17:33, 19 September 2006 (CDT)

Personal tools