HDK4.0 (the Hidden Markov Model Tookit with explicit state duration modeling) HTK 3.1 Extension --- Explicit Duration HMM written by Ken Chen, 03/30/03 Compiling and Installing ======================== This release will create HDInit, HDRest, HDERest, HDHEd and HDVite, which can be used to train and evalute explicit duration HMMs. If you are on an Win32 system, you can directly copy the executables from the bin.win32 directory, and put them until your HTK/bin. If you are on other OS, please follow the following instruction to recompile the executables. The Makefiles have been changed, so after compiling, only these 5 executables will be generated. Instruction for the Installation ================================ 1. If you don't have HTK3.1, please download it from http://htk.eng.cam.ac.uk/, expand it to a directory (for example, HTK3.1). 2. Download the HDK from http://www.ifp.uiuc.edu/speech/software/, expand it to a directory (for example, HDK4.0). 3. Specify the path of HTK3.1 in variable $HTKDIR using command like: setenv HTKDIR /homes/user/HTK3.1 4. Change the current directory to the HDK directory cd HDK4.0 5. run mkHDK.sh mkHDK.sh This bash script will create the HDK files based on the HTK3.1 files and the HDK patch files in "pat" directory, and store the new files in HDKLib and HDKTools. 6. Compile HDK using the Makefiles in HDK/HDKLib and HDKTools. Before compiling, proper environment variables need to be specified. They can be found in HTK3.1/env. You also need to specify the location of HDK directory in $HBIN by: setenv HBIN /homes/user/HDK4.0 and create a subdirectory under $HBIN to hold the executables: mkdir bin.${CPU} 7. After compilation, you will find 5 HDK executable files in bin.${CPU). They are: HDInit, HDRest, HDERest, HDHed, and HDVite Instructions on using the tools =============================== -In this version, HDInit, HDRest, HDERest and HDVite can directly read any hmm model file trained by HTK. When there are no duration parameters in the input hmm model files, a length 150 vector with each element equal to 1/150 will be automatically generated, which imposes an uniform duration density on the models. During training, HDInit, HDRest and HDERest will automatically adjust the maximum allowed duration to fit the data. Limitations =========== - When using HDERest, the trimming option -t should be used for long utterance (like those in RNC), otherwise the computation may be extremely slow. For TIMIT and TIDIGITS, -t options can be optional. Sometimes you may need to manually increase the initial trimming threshold (make it 500 instead of 250, for example) if a warning message is encountered. - Have trouble dealing with extremely long duration. (For example, the duration of silence). Change Log of previous versions ================================ --------------------------------- Version 1.01, released on 8/27/02 --------------------------------- Triming of HDEREst has been modified to be more robust and error-tolerating. In particular, when Alpha pass is overpruned, which is caused by overpruning on Beta pass due to a small intial pruning threshold set by the first number after -t option, instead of stopping the program as in the early version, an warning message will be given suggesting to increase the initial pruning threshold. Some prelimilary experiments shows that when this happens, the models will still converge towards the local maxmimum but at a slower speed. The degrading of training effectiveness caused by this overpruning may be ignorable, if HEDRest is used for many rounds (e.g. 10+ rounds) --------------------------------- Version 2.0, released on 9/10/02 --------------------------------- 1. Gaussian Mixture Model can be used for the observation PDFs. Multiple stream is implemented in the code, however not tested at this time. 2. When initial Duration parameters are not available, default values will be taken. If HTK can't find the parameters in the HMMDEF files, it will automatically generate a length 30 all-one vector as the initial duration PMF. In addition, the diagonal elements of the transition Matrix will be automatically rounded to zero because self-transition is not allowed in a CVDHMM. With these functiones, a model generated by HInit can be directly read by HDRest and HDERest without any modification, unless you want the maximum allowed length to be some number other than 30, or you have a meanful initial duration PMF to put in. --------------------------------- Version 3.0, released on 9/26/02 --------------------------------- 1. HDVite is now doing "real Viterbi decoding" that utilizes the duration information. 1. -t option for HDVite is no longer supported. 2. HRest and HDERest has the ability to choose the ideal maximum allowed duration if it is smaller than 100 (the initial default maximum-allowed duration). When maximum allowed duration is potentially greater than 100, manual editing on the initial hmm models is required, as what should be done in the older versions. 3. Maximum allowed duration is chosen based on the following empirical rules: Experiments show that this rule works reasonably well. 4. Zero duration probabilities undesirably restrict the flexibility of the models during training. A smaller positive value (1e-30) is added to the number of occurances of the duration (refer to exact equation) to prevent the duration probabilities becoming zero, 5. Duration PMF is smoothed by a 3 tap FIR before output, which hopefully helps avoid overtraining (make the models easier to generalize on the unseen examples). --------------------------------- Version 3.1, released on 9/26/02 --------------------------------- 1. A bug in disposing prestate token in "DetachIns() in HRec.c" is fixed 2. Attempted to restore the -t option for HDVite, however unfinished yet. --------------------------------- Version 4.0, released on 9/26/02 --------------------------------- 1. Some type of Memory leakage in HRec is fixed. An Example HMM definition file created/used by HDK =================================================== ~h "aa" 5 2 32 8.258567e+000 -7.952893e+000 -9.369713e+000 -6.236930e+000 4.085467e+000 3.271954e+000 -2.782298e+000 -1.784868e+000 1.937994e+000 -6.288005e-001 -5.137473e+000 -6.121535e+000 -3.566568e+000 1.885456e+000 -5.053589e-001 7.792020e-001 1.830798e+000 -1.524833e+000 -2.454788e+000 -1.851530e+000 -4.972494e-002 8.901749e-001 1.057453e-001 -3.928754e-001 -4.753329e-002 1.010218e-002 -5.191800e-001 -4.904054e-001 -1.220759e+000 4.880897e-001 7.437041e-001 4.769040e-002 32 1.305321e+001 2.324185e+001 3.259472e+001 3.943022e+001 6.651920e+001 9.711606e+001 4.807242e+001 6.756403e+001 6.539613e+001 5.589016e+001 6.108249e+001 3.966216e+001 3.717031e+001 3.336739e+001 2.805880e+001 1.195359e-002 2.502465e+000 2.683443e+000 2.997185e+000 3.848098e+000 5.873883e+000 7.919758e+000 5.456886e+000 6.172982e+000 5.187209e+000 4.094931e+000 4.277874e+000 3.773547e+000 2.940504e+000 3.840070e+000 2.335645e+000 1.539369e-003 1.249120e+002 21 2.388034e-001 1.217701e-001 1.376958e-001 1.736308e-001 1.392978e-001 6.106059e-002 3.009788e-002 3.041845e-002 3.252161e-002 1.310709e-002 4.795367e-003 6.592146e-003 3.473328e-003 9.348100e-004 1.526406e-003 2.277653e-003 2.211356e-004 2.380510e-005 1.221422e-003 5.303671e-004 8.121934e-011 3 32 8.144916e+000 -1.099800e+001 -1.181883e+001 -8.403924e+000 5.839967e+000 6.074781e+000 -3.236461e+000 -4.144921e+000 1.769320e+000 -2.791981e+000 -7.098788e+000 -6.679684e+000 -2.316005e+000 3.839842e+000 9.923016e-002 8.437856e-001 9.145033e-002 7.946250e-001 6.542143e-001 6.698902e-001 -9.407746e-001 -1.691351e+000 -8.553612e-002 1.111058e+000 1.592208e-001 -3.320189e-001 -2.703247e-001 7.188587e-001 5.722113e-001 -7.360712e-001 -1.953720e-001 -1.111898e-002 32 1.135115e+001 2.484098e+001 2.533596e+001 3.570905e+001 5.017525e+001 6.808315e+001 6.487836e+001 5.125293e+001 5.673242e+001 6.761422e+001 4.551351e+001 3.526076e+001 4.174304e+001 4.484420e+001 3.248788e+001 6.774498e-003 6.021999e-001 2.066942e+000 1.684058e+000 2.131902e+000 3.029894e+000 3.633055e+000 3.959359e+000 3.509692e+000 3.442718e+000 3.177190e+000 3.246521e+000 3.023046e+000 2.354336e+000 2.437469e+000 1.575999e+000 2.210136e-004 1.141493e+002 19 3.116189e-001 1.022522e-001 9.994859e-002 1.215467e-001 7.112376e-002 9.981646e-002 8.192489e-002 3.985169e-002 2.421077e-002 6.961197e-003 1.259287e-002 9.462628e-003 4.795712e-003 4.088662e-003 4.064946e-003 2.933395e-003 2.675344e-003 1.313342e-004 2.967786e-011 4 32 4.899230e+000 6.766077e-001 -4.760230e+000 -8.692033e-001 4.164542e+000 2.713707e+000 -3.377059e-001 -1.308773e+000 6.514856e-001 -1.171742e+000 -5.258502e+000 -5.289211e+000 3.398129e-001 2.193654e-001 -1.437730e-001 5.955293e-001 -3.027383e+000 3.277976e+000 2.182962e+000 3.101508e+000 9.276850e-001 7.261626e-001 9.589029e-001 1.610976e+000 -1.912705e-001 -5.329051e-002 1.394743e+000 7.309471e-001 7.418432e-001 1.748159e-001 7.468964e-001 -1.085030e-001 32 2.149284e+001 4.465082e+001 3.850081e+001 4.770080e+001 5.151118e+001 7.371589e+001 5.169162e+001 6.789821e+001 5.769816e+001 5.950100e+001 5.316364e+001 4.175600e+001 3.438448e+001 3.239322e+001 2.650323e+001 2.708219e-002 3.498327e+000 4.253266e+000 3.308844e+000 5.710029e+000 4.811119e+000 7.356844e+000 5.505331e+000 6.183210e+000 5.342669e+000 6.164818e+000 6.085370e+000 3.750928e+000 3.065076e+000 3.088236e+000 2.772284e+000 2.133643e-003 1.286028e+002 11 2.800771e-001 2.312171e-001 2.719460e-001 1.379217e-001 5.208718e-002 1.956706e-002 3.826426e-003 1.727127e-003 1.058611e-003 5.711695e-004 5.183401e-007 5 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000