Scripts Documentation

From SpeechWiki

Revision as of 21:44, 12 January 2009 by Mark Hasegawa-Johnson (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

SVN Repository for Scripts

All the python and perl scripts are in svn://mickey/scripts and a version is temporarily checked out into /cworkspace/ifp-32-1/hasegawa/programs/scriptsTemp/scripts The readme.html file there explains the structure. (The old scripts are still in /cworkspace/ifp-32-1/hasegawa/programs/scripts because I have some jobs still using them, but they will be replaced with /cworkspace/ifp-32-1/hasegawa/programs/scriptsTemp/scripts shortly) A web version is temporarily at http://mickey.ifp.uiuc.edu/speech/akantor/fisher/scripts/

Documentation

The autogenerated doc is not in SVN, but can be recreated with /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/makeDoc.sh for both perl and python. The autogenerated docs are placed in /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/perl and /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/python

Python is pretty well taken care of with epydoc, with the command-line --help documentation and the web documentation being generated from the same place, with some minimal requirements for the way people write their scripts. They can follow examples of the existing ones.

Perl documentation is generated with doxygen from the awkward POD documentation. None of my scripts are documented with it yet - I still have to convert comments into something compatible. Also the command-line --help documentation (where most of my comments are right now) is not imported into the web-pages yet. I will try to get it sorted out.

I believe doc-generators for both perl and python can understand the javadoc/doxygen comment syntax that people are familiar with.

Software: The following modules can be used independently of each other (my scripts use all of them and can be used as examples):

Python modules: gmtkParam A complete library for reading/writing/manipulating GMTK parameter files. It can for instance read/write my master file and trainableParams files.

Perl modules:

AI::GMTK for parallel fault-tolerant single iteration training, training to convergence and viterbi.

Config::OptionsSet::OptionsSet.pm for reading/writing/displaying sets of options (perl dictionaries at their simplest)

Config::OptionsSet::Grid.pm compactly representing sets of options that differ by a few parameters, e.g. when tuning over a particular parameter

Getopt::Lazier.pm A command-line options parser based on Getopt::Long, which also generates pretty documentation for the options, and does simple validation on the options. (e.g. is the option required, or must the option specify and existing file or dir)

OS::Util.pm simple utility functions nothing really interesting here

Parallel::Distribute.pm Used to submit a set of tasks to the SGE cluster and wait for them all to finish, returning an error if any one of them returns an error

The module names are chosen so that we can submit them to CPAN without major changes.

python/perl Scripts:

AI::GMTK::* and Parallel::Distribute.pm have drivers in scripts/gmtk/ (emConvergeParallel.pl emTrainParallel.pl viterbiParallel.pl) and scripts/parallel/distribute.pl

There is also a script to quickly mirror the data to the scratch space of all compute nodes with bittorrent: scripts/parallel/mirrorScratch.py and scripts for generating aditional files needed for GMTK in scripts/gmtk/

Most scripts will give decent usage help if called with --help, but the actual source code documentation is a bit sparse.

Personal tools