Graham Neubig's Software

This is a page for the software that I have made, most of it has been used in my research.

You can often see what I am currently working on on my github page. I like coding, so if there are lots of green boxes on my activity timeline that usually means that I'm having fun! You can also check out things on the NeuLab github org.

Recent Projects

Some recent projects that I (and my collaborators) have been working on as of January 2024 are:

Zeno: A visualization tool to make it easier to rigorously evaluate machine learning models.
prompt2model: A tool that allows you to provide a task description in natural language, and automatically trains a models for you.

Older Projects

Older projects that I spent a reasonable amount of time on are (in reverse chronlogical order):

ExplainaBoard: a toolkit for explainable leaderboards for machine learning models.
compare-mt: a toolkit for comparing the outputs of text generation systems
DyNet: a toolkit for dynamic neural networks, which are useful in natural language processing and many other fields.
lamtram is a toolkit for language modeling or translation modeling using neural networks.
Travatar: a toolkit for tree-to-string translation, which is able to achieve high accuracy between languages with large amounts of reordering.
Lader: a toolkit for long-distance reordering in machine translation, which also functions as an unsupervised discriminative parser.
pialign: a phrase alignment tool for phrase-based machine translation that can be used with the Moses decoder.
latticelm: a tool for unsupervised word segmentation using the Bayesian Pitman-Yor Language model.
KyTea: a toolkit for text analysis of languages that require word segmentation such as Japanese and Chinese.
Kyfd: a toolkit for decoding weighted finite state transducer models for text processing.
Kylm: a simple language modeling toolkit written entirely in Java, implementing n-gram language models with a number of smoothing methods.

Other Programs and Scripts

These weren't serious projects, but I'll keep them around for posterity.

tmert.py: A program for thresholded minimum error rate training for question answering systems.
prontron: A program that does pronunciation estimation (mainly in Japanese) using the structured perceptron.
dirichlet-topic.pl: A simple script that allows you to find representative words for a specific topic (using a model based on Dirichlet processes).