High Level Scripts Documentation

From SpeechWiki

Jump to: navigation, search

The following modules can be used independently of each other (The GMTK scripts use all of them and can be used as examples):

Python modules

gmtkParam
A complete library for reading/writing/manipulating GMTK

parameter files. It can for instance read/write my master file and trainableParams files.

Perl modules

AI::GMTK
for parallel fault-tolerant single iteration training, training to convergence and viterbi.
Config::OptionsSet::OptionsSet.pm
for reading/writing/displaying sets of options (perl dictionaries at their simplest)
Config::OptionsSet::Grid.pm
compactly representing sets of options that differ by a few parameters, e.g. when tuning over a particular parameter
Getopt::Lazier.pm
A command-line options parser based on Getopt::Long, which also generates pretty documentation for the options, and does simple validation on the options. (e.g. is the option required, or must the option specify and existing file or dir)
OS::Util.pm
simple utility functions nothing really interesting here
Parallel::Distribute.pm
Used to submit a set of tasks to the SGE cluster and wait for them all to finish, returning an error if any one of them returns an error

The module names are chosen so that we can submit them to CPAN without major changes.

python/perl Scripts

AI::GMTK::* and Parallel::Distribute.pm have drivers in scripts/gmtk/ (emConvergeParallel.pl emTrainParallel.pl viterbiParallel.pl) and scripts/parallel/distribute.pl

There is also a script to quickly mirror the data to the scratch space of all compute nodes with bittorrent: scripts/parallel/mirrorScratch.py and scripts for generating aditional files needed for GMTK in scripts/gmtk/

Most scripts will give decent usage help if called with --help, but the actual source code documentation is a bit sparse.

Personal tools