GMTK parallel tools
From SpeechWiki
Contents |
Scripts and Modules
The *Parallel.pl scripts parse the command line args and call the routines in the corresponding *Parallel.pm modules to actually do the work.
- viterbiParallel.pl and viterbiParallel.pm Does a viterbi decoding in parallel and then optionally runs sclite to report recognition accuracy
- emtrainParallel.pl and emtrainParallel.pm Does a single iteration of EM training in parallel
- emConvergeParallel.pl and emConvergeParallel.pm Does a sequence of em training iterations to convergence followed by splits/vanishes of gaussians followed by more iterations to convergence according to some convergence and split/vanish schedule
- distribute.pl and distribute.pm Runs a list of commands in parallel on a SGE cluster.
- gmtkUtilParallel.pm
- grid.pm
Features
- All of the code is packaged in modules: The tools can be used by calling a perl function instead of starting another script in a new process.
- All tools are restartable. If they are interrupted for any reason (i.e. cluster glitch, or user ctrl-c's the job), rerunning the command will only do the minimum work required to complete the job. Successfully completed sub-tasks are not rerun.
- A fast sanity check is performed before any parallel jobs are fired off. This way, the user gets fast feedback on simple mistakes.
- killing the the main script (with ctrl-c, for example) stops all execution all the compute nodes
Installation and Environment
The easiest way is to have the following in your path:
- gmtk binaries
- sclite
- SGE commands (such as qsub)
- These scripts themselves
You must also have the *.pm modules in your path. You can do that by setting the PERL5LIB environment variable.
Documentation
Not much yet. Some very rough overview slides are here. However the *.pm modules are relatively documented - hopefully enough to be useful.
Additional Resources
--Arthur 17:33, 19 September 2006 (CDT)