Scripts Documentation

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
(New page: =SVN Repository for Scripts= All the python and perl scripts are in svn://mickey/scripts and a version is temporarily checked out into /cworkspace/ifp-32-1/hasegawa/programs/scriptsTemp/s...)
(more clarifications for documenting and usage)
Line 1: Line 1:
-
=SVN Repository for Scripts=
+
=SVN And `Official' public-use location for Scripts=
-
All the python and perl scripts are in svn://mickey/scripts and a
+
All the python and perl scripts are in svn://mickey/scripts.
-
version is temporarily checked out into
+
 
-
/cworkspace/ifp-32-1/hasegawa/programs/scriptsTemp/scripts
+
A version of SVN is checked out into /cworkspace/ifp-32-1/hasegawa/programs/scripts on the ifp-32 cluster.
 +
It is intended to always be in usable state, and our SST group should use it from there.
The readme.html file there explains the structure.
The readme.html file there explains the structure.
-
(The old scripts are still in
+
 
-
/cworkspace/ifp-32-1/hasegawa/programs/scripts because I have some jobs
+
A web view onto the official version is temporarily at
-
still using them, but they will be replaced with
+
-
/cworkspace/ifp-32-1/hasegawa/programs/scriptsTemp/scripts shortly)
+
-
A web version is temporarily at
+
http://mickey.ifp.uiuc.edu/speech/akantor/fisher/scripts/
http://mickey.ifp.uiuc.edu/speech/akantor/fisher/scripts/
=Documentation=
=Documentation=
-
The autogenerated doc is not in SVN, but can be recreated with
+
==Locations==
-
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/makeDoc.sh for both
+
 
 +
Some documentation is automatically generated from the the source code comments.
 +
The autogenerated doc is not in SVN, but can (and should) be regularly recreated with
 +
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/makeDoc.sh This creates the docs for both
perl and python. The autogenerated docs are placed in
perl and python. The autogenerated docs are placed in
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/perl and
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/perl and
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/python
/cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/python
-
Python is pretty well taken care of with epydoc, with the command-line
+
==How to write comments meaningful for the auto doc generation==
 +
 
 +
In both perl and java you can write comments using the [http://java.sun.com/j2se/javadoc/writingdoccomments/index.html javadoc] conventions (more precisely, [http://epydoc.sourceforge.net/ epydoc] for python and [http://www.stack.nl/~dimitri/doxygen/docblocks.html doxygen] for perl, although both should be supersets of javadoc)
 +
 
 +
Both perl and python documentation additionally include the usage text that should display when one runs
 +
the script with the --help option.
 +
 
 +
===python===
 +
Python documentation is pretty well taken care of with epydoc, with the command-line
--help documentation and the web documentation being generated from the
--help documentation and the web documentation being generated from the
same place, with some minimal requirements for the way people write
same place, with some minimal requirements for the way people write
their scripts.  They can follow examples of the existing ones.
their scripts.  They can follow examples of the existing ones.
 +
Basically, you need a file docstring to contain somewhere a <code>%InsertOptionParserUsage%</code>
 +
string which will be replaced by usage documentation
 +
 +
<pre>
 +
"""
 +
%InsertOptionParserUsage%
 +
 +
The rest of the file documentation...
 +
@author ...
 +
@see ...
 +
"""
 +
</pre>
 +
 +
and also you need to augment the __doc__ string with the usage info
 +
whenever the file is interpreted by python.
 +
 +
Putting the following at the end of the file works:
 +
<pre>
 +
#the parser is used for generating documentation, so create it always, and augment __doc__ with usage info 
 +
#This messes up epydoc a little, but allows us to keep a single version of documentation for all purposes
 +
parser = makeParser()
 +
__doc__ = __doc__.replace("%InsertOptionParserUsage%\n", parser.format_help())
 +
 +
 +
if __name__ == "__main__":
 +
 +
main(sys.argv)
 +
</pre>
 +
 +
If you don't do the above, the documentation will be generated without the --help usage.
 +
"""
 +
 +
===perl===
 +
Perl documentation is generated with doxygen and a [http://www.bigsister.ch/doxygenfilter doxygenfilter] script modified to generate --help usage. 
 +
 +
Comments used for doc generation should have the first line start with a <code>##</code>, like this:
 +
<pre>
 +
## @file
 +
# Based on genPhonePhonePos2WholePhoneStateDTs.pl from the gmtk Aurora tutorial
 +
</pre>
 +
 +
If a comment with <code>@file</code> directive is present (as above). The documentation is associated with the file.  In this case, doxygenfilter simply runs the file with the --help option, <code>file.py --help</code> and includes the output with the documentation.
 +
 +
Admittedly this is a little dangerous, but I (Arthur) tried to do this quickly, so anyone is welcome to improve this.
-
Perl documentation is generated with doxygen from the awkward POD
+
Also note that perl does not really have named arguments, so doxygenfilter actually tries to parse the code for common assignment of the argument list to vars (e.g. <code>my ($arg1, arg2) = @_;</code>), and generates the argument names from there. You can of course specify the arguments in the documentation too:
-
documentation.  None of my scripts are documented with it yet - I still
+
<pre>
-
have to convert comments into something compatible. Also the
+
## @fn private void debug(@args)
-
command-line --help documentation (where most of my comments are right
+
# A simple function for debugging. Prints the arguments to STDERR.
-
now) is not imported into the web-pages yet. I will try to get it
+
# @param args The stuff to be printed.
-
sorted out.
+
sub debug {
 +
    my(@args) = @_;
 +
}
 +
</pre>
-
I believe doc-generators for both perl and python can understand the
+
=Software=
-
javadoc/doxygen comment syntax that people are familiar with.
+
The following modules can be used independently of each other (The
 +
GMTK scripts use all of them and can be used as examples):
-
Software:
+
==Python modules==
-
The following modules can be used independently of each other (my
+
-
scripts use all of them and can be used as examples):
+
-
Python modules:
+
;<code>gmtkParam</code>
-
gmtkParam A complete library for reading/writing/manipulating GMTK
+
:A complete library for reading/writing/manipulating GMTK
parameter files.  It can for instance read/write my master file and
parameter files.  It can for instance read/write my master file and
trainableParams files.
trainableParams files.
-
Perl modules:
+
==Perl modules==
-
AI::GMTK for parallel fault-tolerant single iteration training, training
+
;<code>AI::GMTK</code>
-
to convergence and viterbi.
+
:for parallel fault-tolerant single iteration training, training to convergence and viterbi.
-
Config::OptionsSet::OptionsSet.pm for reading/writing/displaying sets of
+
;<code>Config::OptionsSet::OptionsSet.pm</code>
-
options (perl dictionaries at their simplest)
+
:for reading/writing/displaying sets of options (perl dictionaries at their simplest)
-
Config::OptionsSet::Grid.pm compactly representing sets of options that
+
;<code>Config::OptionsSet::Grid.pm</code>
-
differ by a few parameters, e.g. when tuning over a particular parameter
+
:compactly representing sets of options that differ by a few parameters, e.g. when tuning over a particular parameter
-
Getopt::Lazier.pm A command-line options parser based on Getopt::Long,
+
;<code>Getopt::Lazier.pm</code>
-
which also generates pretty documentation for the options, and does
+
:A command-line options parser based on <code>Getopt::Long</code>, which also generates pretty documentation for the options, and does simple validation on the options. (e.g. is the option required, or must the option specify and existing file or dir)
-
simple validation on the options. (e.g. is the option required, or must
+
-
the option specify and existing file or dir)
+
-
OS::Util.pm simple utility functions nothing really interesting here
+
;<code>OS::Util.pm</code>
 +
:simple utility functions nothing really interesting here
-
Parallel::Distribute.pm  Used to submit a set of tasks to the SGE
+
;<code>Parallel::Distribute.pm</code>  
-
cluster and wait for them all to finish, returning an error if any one
+
:Used to submit a set of tasks to the SGE cluster and wait for them all to finish, returning an error if any one of them returns an error
-
of them returns an error
+
The module names are chosen so that we can submit them to CPAN without
The module names are chosen so that we can submit them to CPAN without
major changes.
major changes.
-
python/perl Scripts:
+
==python/perl Scripts==
-
AI::GMTK::* and Parallel::Distribute.pm have drivers in
+
<code>AI::GMTK::*</code> and <code>Parallel::Distribute.pm</code> have drivers in
-
scripts/gmtk/ (emConvergeParallel.pl  emTrainParallel.pl
+
scripts/gmtk/ (emConvergeParallel.pl  emTrainParallel.pl viterbiParallel.pl)
-
viterbiParallel.pl)
+
and scripts/parallel/distribute.pl
and scripts/parallel/distribute.pl

Revision as of 07:28, 14 January 2009

Contents

SVN And `Official' public-use location for Scripts

All the python and perl scripts are in svn://mickey/scripts.

A version of SVN is checked out into /cworkspace/ifp-32-1/hasegawa/programs/scripts on the ifp-32 cluster. It is intended to always be in usable state, and our SST group should use it from there. The readme.html file there explains the structure.

A web view onto the official version is temporarily at http://mickey.ifp.uiuc.edu/speech/akantor/fisher/scripts/

Documentation

Locations

Some documentation is automatically generated from the the source code comments. The autogenerated doc is not in SVN, but can (and should) be regularly recreated with /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/makeDoc.sh This creates the docs for both perl and python. The autogenerated docs are placed in /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/perl and /cworkspace/ifp-32-1/hasegawa/programs/scripts/doc/python

How to write comments meaningful for the auto doc generation

In both perl and java you can write comments using the javadoc conventions (more precisely, epydoc for python and doxygen for perl, although both should be supersets of javadoc)

Both perl and python documentation additionally include the usage text that should display when one runs the script with the --help option.

python

Python documentation is pretty well taken care of with epydoc, with the command-line --help documentation and the web documentation being generated from the same place, with some minimal requirements for the way people write their scripts. They can follow examples of the existing ones. Basically, you need a file docstring to contain somewhere a %InsertOptionParserUsage% string which will be replaced by usage documentation

"""
%InsertOptionParserUsage%

The rest of the file documentation...
@author ...
@see ...
"""

and also you need to augment the __doc__ string with the usage info whenever the file is interpreted by python.

Putting the following at the end of the file works:

#the parser is used for generating documentation, so create it always, and augment __doc__ with usage info  
#This messes up epydoc a little, but allows us to keep a single version of documentation for all purposes
parser = makeParser()
__doc__ = __doc__.replace("%InsertOptionParserUsage%\n", parser.format_help())


if __name__ == "__main__":
	
	main(sys.argv)

If you don't do the above, the documentation will be generated without the --help usage. """

perl

Perl documentation is generated with doxygen and a doxygenfilter script modified to generate --help usage.

Comments used for doc generation should have the first line start with a ##, like this:

## @file 
# Based on genPhonePhonePos2WholePhoneStateDTs.pl from the gmtk Aurora tutorial

If a comment with @file directive is present (as above). The documentation is associated with the file. In this case, doxygenfilter simply runs the file with the --help option, file.py --help and includes the output with the documentation.

Admittedly this is a little dangerous, but I (Arthur) tried to do this quickly, so anyone is welcome to improve this.

Also note that perl does not really have named arguments, so doxygenfilter actually tries to parse the code for common assignment of the argument list to vars (e.g. my ($arg1, arg2) = @_;), and generates the argument names from there. You can of course specify the arguments in the documentation too:

## @fn private void debug(@args)
# A simple function for debugging. Prints the arguments to STDERR.
# @param args The stuff to be printed.
sub debug {
    my(@args) = @_;
}

Software

The following modules can be used independently of each other (The GMTK scripts use all of them and can be used as examples):

Python modules

gmtkParam
A complete library for reading/writing/manipulating GMTK

parameter files. It can for instance read/write my master file and trainableParams files.

Perl modules

AI::GMTK
for parallel fault-tolerant single iteration training, training to convergence and viterbi.
Config::OptionsSet::OptionsSet.pm
for reading/writing/displaying sets of options (perl dictionaries at their simplest)
Config::OptionsSet::Grid.pm
compactly representing sets of options that differ by a few parameters, e.g. when tuning over a particular parameter
Getopt::Lazier.pm
A command-line options parser based on Getopt::Long, which also generates pretty documentation for the options, and does simple validation on the options. (e.g. is the option required, or must the option specify and existing file or dir)
OS::Util.pm
simple utility functions nothing really interesting here
Parallel::Distribute.pm
Used to submit a set of tasks to the SGE cluster and wait for them all to finish, returning an error if any one of them returns an error

The module names are chosen so that we can submit them to CPAN without major changes.

python/perl Scripts

AI::GMTK::* and Parallel::Distribute.pm have drivers in scripts/gmtk/ (emConvergeParallel.pl emTrainParallel.pl viterbiParallel.pl) and scripts/parallel/distribute.pl

There is also a script to quickly mirror the data to the scratch space of all compute nodes with bittorrent: scripts/parallel/mirrorScratch.py and scripts for generating aditional files needed for GMTK in scripts/gmtk/

Most scripts will give decent usage help if called with --help, but the actual source code documentation is a bit sparse.

Personal tools