Training Options

The following options are accepted by Travatar's train-travatar.pl script. There is a possibility that this documentation is out-of-date, so check inside travatar-train.pl to confirm the latest version.

-work_dir  The working directory to use
-travatar_dir  The directory of Travatar
-bin_dir  A directory for external bin files (mainly GIZA)
-threads  The number of threads to use
-src_file  The source file you want to train on
-src_words  A file of plain-text sentences from the source
-src_format  The source file format (penn/egret)
-trg_file  The target file you want to train on
-trg_words  A file of plain-text sentences from the target
-trg_format The target file format (word/penn/egret)
-lex_srctrg  of the source word given the target P(f|e)
-lex_trgsrc  of the target word given the source P(e|f)
-align_file  A file containing alignments
-align  The type of alignment to use (giza)
-symmetrize  The type of symmetrization to use (grow)
-normalize  Normalize rule counts to probabilities
-binarize  Binarize trees in a certain direction
-compose  The number of rules to compose
-attach  Where to attach rules
-nonterm_len  The maximum number of non-terminals in a rule
-term_len  The maximum number of terminals in a rule
-nbest_rules  The maximum number of rules for each source
-tm_file  An already created TM file
-lm_file  An already created LM file
-config_file  Where to output the configuration file
-no_lm  Indicates that no LM will be used