lader is a program that is able to train and use discriminative parsers to improve machine translation reordering. It is unlike other parsers in that it can be trained directly from aligned parallel text with no annotated syntax trees. Using it for translation between language pairs with very different word order can greatly improve translation accuracy.
lader was developed mainly by Graham Neubig during his period as an intern at NICT. Hwidong Na has also done a large amount of work on improving training speed on a single machine, and Jeremy Gwinnup has contributed code for parallel training.
If you would like more details about the method or want to cite lader in your research, please reference:
Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Graham Neubig, Taro Watanabe, and Shinsuke Mori
Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL 2012)
You can also read about improvments in training time in:
A Discriminative Reordering Parser for IWSLT 2013
Hwidong Na and Jong-Hyeok Lee
International Workshop on Spoken Language Translation (IWSLT 2013)
Latest Version: @github
Past Versions: lader 0.1.6 lader 0.1.5 lader 0.1.4 lader 0.1.3 lader 0.1.2 lader 0.1.1 lader 0.1.0
Models: (These may work with 0.1.5, but not the latest version)
The code of lader is distributed according to the Eclipse Public License, v 1.0, and can be distributed freely according to this license.
On Linux, Mac OS X, or Cygwin, download the source code, and install using the following commands.
tar -xzf lader-X.X.X.tar.gz cd lader-X.X.X ./configure make src/bin/lader --help
If this prints a help message, lader is working properly.
An example of how to run the program in included in the "example/" directory of the download. There are two well-documented scripts (train-model.sh and test-model.sh) showing how to train and use the reorderer. If you want more information about how to define the feature set for lader, you can visit the features page for more details. You can also find a full training script for a machine translation system using lader at the Kyoto Free Translation Task web site.
lader can be trained more quickly at the expense of (greatly) increased memory use by enabling the -save_features option. This will only calculate the features for each parse tree in the training data once, and re-use these after the second iteration. The total amount of memory necessary will depend on the length of your sentences and number of features used, but with the default feature set you should expect to use about 5 megabytes per sentence.
There is also a script script/trainLaderParallel.pl by Jeremy Gwinnup that will allow you to train lader in parallel on a SunGrid engine. When you run this script, be sure to set the -learner perceptron option of train-lader, as the default learner (Pegasos) does not play well with parallelization and parameter mixing. Better documentation is forthcoming.
The speed of lader training could also be greatly improved through a better feature representation. If you are a developer and interested in helping with this, contact me and I will help you start out.
lader can be simply parallelized by using the -threads option. If this is still too slow, you can split the data and run lader on multiple machines if you have them at your disposal. Also, very long sentences (80 words or more?) can take more time, so filtering the data to remove these sentences will also increase speed.
Like training, parsing speed will also be improved with a new feature representation, so if you are a developer that wants to help please contact me.
You can see the output of the parser in English or Japanese. This is on the held-out test set, so the trees may include some errors. (You can visualize trees with my simple script and NLTK or a TreeBank viewer).
If you are interested in participating in the lader project, please send an email to neubig at gmail dot com.