GMTK parallel tools
From SpeechWiki
(Difference between revisions)
m |
|||
(2 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
- | + | ===Scripts and Modules=== | |
+ | The *Parallel.pl scripts parse the command line args and call the routines in the corresponding *Parallel.pm modules to actually do the work. | ||
+ | * ''viterbiParallel.pl'' and ''viterbiParallel.pm'' Does a viterbi decoding in parallel and then optionally runs sclite to report recognition accuracy | ||
+ | * ''emtrainParallel.pl'' and ''emtrainParallel.pm'' Does a single iteration of EM training in parallel | ||
+ | * ''emConvergeParallel.pl'' and ''emConvergeParallel.pm'' Does a sequence of em training iterations to convergence followed by splits/vanishes of gaussians followed by more iterations to convergence according to some convergence and split/vanish schedule | ||
+ | * ''distribute.pl'' and ''distribute.pm'' Runs a list of commands in parallel on a [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm SGE] cluster. | ||
+ | * ''gmtkUtilParallel.pm'' | ||
+ | * ''grid.pm'' | ||
- | + | ===Features=== | |
+ | * All of the code is packaged in modules: The tools can be used by calling a perl function instead of starting another script in a new process. | ||
+ | * All tools are restartable. If they are interrupted for any reason (i.e. cluster glitch, or user ctrl-c's the job), rerunning the command will only do the minimum work required to complete the job. Successfully completed sub-tasks are not rerun. | ||
+ | * A fast sanity check is performed before any parallel jobs are fired off. This way, the user gets fast feedback on simple mistakes. | ||
+ | * killing the the main script (with ctrl-c, for example) stops all execution all the compute nodes | ||
- | + | ===Installation and Environment=== | |
+ | The easiest way is to have the following in your path: | ||
+ | * [http://ssli.ee.washington.edu/~bilmes/gmtk/linux/ gmtk binaries] | ||
+ | * [http://www.nist.gov/speech/tools/ sclite] | ||
+ | * SGE commands (such as qsub) | ||
+ | * These scripts themselves | ||
- | + | You must also have the *.pm modules in your path. You can do that by setting the PERL5LIB environment variable. | |
- | [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm | + | ===Documentation=== |
+ | Not much yet. Some very rough overview slides are [http://mickey.ifp.uiuc.edu/speech/akantor/GMTK%20parallel%20tools.ppt here]. | ||
+ | However the *.pm modules are relatively documented - hopefully enough to be useful. | ||
+ | ===Additional Resources=== | ||
+ | |||
+ | [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/HTK_parallel.htm Bowon's parallel HTK tools] | ||
+ | |||
+ | [http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm Bowon's SGE basics] | ||
+ | |||
+ | --[[User:Arthur|Arthur]] 17:33, 19 September 2006 (CDT) |
Latest revision as of 14:31, 30 October 2008
Contents |
Scripts and Modules
The *Parallel.pl scripts parse the command line args and call the routines in the corresponding *Parallel.pm modules to actually do the work.
- viterbiParallel.pl and viterbiParallel.pm Does a viterbi decoding in parallel and then optionally runs sclite to report recognition accuracy
- emtrainParallel.pl and emtrainParallel.pm Does a single iteration of EM training in parallel
- emConvergeParallel.pl and emConvergeParallel.pm Does a sequence of em training iterations to convergence followed by splits/vanishes of gaussians followed by more iterations to convergence according to some convergence and split/vanish schedule
- distribute.pl and distribute.pm Runs a list of commands in parallel on a SGE cluster.
- gmtkUtilParallel.pm
- grid.pm
Features
- All of the code is packaged in modules: The tools can be used by calling a perl function instead of starting another script in a new process.
- All tools are restartable. If they are interrupted for any reason (i.e. cluster glitch, or user ctrl-c's the job), rerunning the command will only do the minimum work required to complete the job. Successfully completed sub-tasks are not rerun.
- A fast sanity check is performed before any parallel jobs are fired off. This way, the user gets fast feedback on simple mistakes.
- killing the the main script (with ctrl-c, for example) stops all execution all the compute nodes
Installation and Environment
The easiest way is to have the following in your path:
- gmtk binaries
- sclite
- SGE commands (such as qsub)
- These scripts themselves
You must also have the *.pm modules in your path. You can do that by setting the PERL5LIB environment variable.
Documentation
Not much yet. Some very rough overview slides are here. However the *.pm modules are relatively documented - hopefully enough to be useful.
Additional Resources
--Arthur 17:33, 19 September 2006 (CDT)