Natural Language Processing Tools

This is a list of NLP tools for various purposes.

Language Modeling

CMU-Cambridge Statistical Language Modeling Toolkit:

Kylm: A language modeling toolkit (written by me) that allows for weighted finite state transducer output and modeling of unknown words. Implemented in Java.

SRILM: An efficient n-gram language modeling toolkit that features a variety of features. A variety of smoothing techniques (including Kneser-Ney), class based models, model merging, etc.

Machine Learning

Mallet: A machine learning package for use in natural language processing. It implements hidden Markov models, maximum entropy Markov models, and conditional random fields. Written in Java.

SVM-Light: An efficient SVM library.

Machine Translation

Moses: A popular statistical machine translation toolkit with programs for alignment, MT decoding, and minimum error rate training.

Morphological Analysis

Chasen: A morphological analysis tool using HMMs. Site is in Japanese.

MeCab: A tool for morphological analysis using conditional random fields (CRFs). Site is in Japanese.

Pronunciation Estimation

KyTea: A toolkit for word segmentation and pronunciation estimation.

Speech Recognition

Julius: An open-source decoder for large vocabulary automatic speech recognition.