Scripts Documentation

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
m
Line 1: Line 1:
-
{{TOClimit}}
+
{{TOClimit|3}}
==SVN And `Official' public-use location for Scripts==
==SVN And `Official' public-use location for Scripts==

Revision as of 07:55, 14 January 2009

SVN And `Official' public-use location for Scripts

All the python and perl scripts are in svn://mickey/scripts.

A version of SVN is checked out into /cworkspace/ifp-32-1/hasegawa/programs/scripts on the ifp-32 cluster. It is intended to always be in usable state, and our SST group should use it from there. The readme.html file there explains the structure.

A web view onto the official version is temporarily at http://mickey.ifp.uiuc.edu/speech/akantor/fisher/scripts/

Documentation

Locations

Some documentation is automatically generated from the the source code comments. The autogenerated doc is not in SVN, but can (and should) be regularly recreated with /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/makeDoc.sh This creates the docs for both perl and python. The autogenerated docs are placed in /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/perl and /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/python

How to write comments meaningful for the auto doc generation

In both perl and java you can write comments using the javadoc conventions (more precisely, epydoc for python and doxygen for perl, although both should be supersets of javadoc)

Both perl and python documentation additionally include the usage text that should display when one runs the script with the --help option.

python

Python documentation is pretty well taken care of with epydoc, with the command-line --help documentation and the web documentation being generated from the same place, with some minimal requirements for the way people write their scripts. They can follow examples of the existing ones. Basically, you need a file docstring to contain somewhere a %InsertOptionParserUsage% string which will be replaced by usage documentation

"""
%InsertOptionParserUsage%

The rest of the file documentation...
@author ...
@see ...
"""

and also you need to augment the __doc__ string with the usage info whenever the file is interpreted by python.

Putting the following at the end of the file works:

#the parser is used for generating documentation, so create it always, and augment __doc__ with usage info  
#This messes up epydoc a little, but allows us to keep a single version of documentation for all purposes
parser = makeParser()
__doc__ = __doc__.replace("%InsertOptionParserUsage%\n", parser.format_help())


if __name__ == "__main__":
	
	main(sys.argv)

If you don't do the above, the documentation will be generated without the --help usage. """

perl

Perl documentation is generated with doxygen and a doxygenfilter script modified to generate --help usage.

Comments used for doc generation should have the first line start with a ##, like this:

## @file 
# Based on genPhonePhonePos2WholePhoneStateDTs.pl from the gmtk Aurora tutorial

If a comment with @file directive is present (as above). The documentation is associated with the file. In this case, doxygenfilter simply runs the file with the --help option, file.py --help and includes the output with the documentation.

Admittedly this is a little dangerous, but I (Arthur) tried to do this quickly, so anyone is welcome to improve this.

Also note that perl does not really have named arguments, so doxygenfilter actually tries to parse the code for common assignment of the argument list to vars (e.g. my ($arg1, arg2) = @_;), and generates the argument names from there. You can of course specify the arguments in the documentation too:

## @fn private void debug(@args)
# A simple function for debugging. Prints the arguments to STDERR.
# @param args The stuff to be printed.
sub debug {
    my(@args) = @_;
}

Script and Module descriptions

The following modules can be used independently of each other (The GMTK scripts use all of them and can be used as examples):

Python modules

gmtkParam
A complete library for reading/writing/manipulating GMTK

parameter files. It can for instance read/write my master file and trainableParams files.

Perl modules

AI::GMTK
for parallel fault-tolerant single iteration training, training to convergence and viterbi.
Config::OptionsSet::OptionsSet.pm
for reading/writing/displaying sets of options (perl dictionaries at their simplest)
Config::OptionsSet::Grid.pm
compactly representing sets of options that differ by a few parameters, e.g. when tuning over a particular parameter
Getopt::Lazier.pm
A command-line options parser based on Getopt::Long, which also generates pretty documentation for the options, and does simple validation on the options. (e.g. is the option required, or must the option specify and existing file or dir)
OS::Util.pm
simple utility functions nothing really interesting here
Parallel::Distribute.pm
Used to submit a set of tasks to the SGE cluster and wait for them all to finish, returning an error if any one of them returns an error

The module names are chosen so that we can submit them to CPAN without major changes.

python/perl Scripts

AI::GMTK::* and Parallel::Distribute.pm have drivers in scripts/gmtk/ (emConvergeParallel.pl emTrainParallel.pl viterbiParallel.pl) and scripts/parallel/distribute.pl

There is also a script to quickly mirror the data to the scratch space of all compute nodes with bittorrent: scripts/parallel/mirrorScratch.py and scripts for generating aditional files needed for GMTK in scripts/gmtk/

Most scripts will give decent usage help if called with --help, but the actual source code documentation is a bit sparse.

Personal tools