I am currently a doctoral candidate at the Kyoto University Graduate School of Informatics, affiliated with the Media Archiving Research Laboratory. Some of my research interests include:
- Unsupervised Language Learning and Modeling
- Statistical Machine Translation
- Error Correction for Noisy Input (Speech, OCR, etc.)
Academic/Career History
- 8/2001-5/2005 University of Illinois, Urbana-Champaign: B.S. Computer Science
- 9/2005-7/2006 Tajima Agricultural High School: Assistant Language Teacher
- 8/2006-3/2008 Hyogo Prefectural Government: Coordinator for International Relations
- 4/2008-3/2010 Kyoto University: Master's course in Intelligent Information Systems
- 4/2010-? Kyoto University: Doctoral course in Intelligent Information Systems
Awards
Software I've Developed
Kylm ("KYoto Language Modeling toolkit"): A language modeling toolkit written in Java. It currently is able to train n-gram models with a variety of smoothing techniques. Eventually it will have the ability to perform detailed comparisons of a number of different types of language models, and simply model unknown words using sub-word structure (characters).
Kyfd ("KYoto Fst Decoder"): A beam-search decoder for FST models written in C++. It features the ability to keep track of separate component weights for log-linear tuning, use hierarchical failure transitions, and handle lattice input.
KyTea ("KYoto Text Analysis toolkit"): A toolkit for text analysis including word (morpheme) segmentation and pronunciation estimation. It can be learned from partially annotated corpora, allowing for rapid domain adaptation.
dirichlet-topic.pl: A simple script that allows you to find representative words for a specific topic (using a model based on Dirichlet processes).
Research Papers
Conference Papers
- Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara.
Learning a Language Model from Continuous Speech (BibTex)
11th Annual Conference of the International Speech Communication Association (InterSpeech 2010). Makuhari, Japan.
September 2010. - Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara.
Semi-automated Update of Automatic Transcription System for the Japanese National Congress (BibTex)
11th Annual Conference of the International Speech Communication Association (InterSpeech 2010). Makuhari, Japan.
September 2010. - Mijit Ablimit, Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara, Askar Hamdulla.
Uyghur Morpheme-based Language Models and ASR (BibTex)
IPSJ Technical Report of the 82th SIG on Spoken Language Processing (SLP-82). Sendai.
July 2010. - Graham Neubig, Shinsuke Mori.
Word-based Partial Annotation for Efficient Corpus Construction (BibTex)
The seventh international conference on Language Resources and Evaluation (LREC 2010). Malta.
May 2010. - Graham Neubig, Yuya Akita, Shinsuke Mori, Tatsuya Kawahara.
Improved Statistical Models for SMT-Based Speaking Style Transformation (BibTex)
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010). Dallas, Texas, USA.
March 2010. - Graham Neubig, Shinsuke Mori, Tatsuya Kawahara.
A WFST-based Log-linear Framework for Speaking-style Transformation (BibTex)
10th Annual Conference of the International Speech Communication Association (InterSpeech 2009). Brighton, UK.
September 2009.
I have also written (or co-authored) 8 papers in Japanese. See the Japanese page for details.
Other Links
- Tools for Natural Language Processing
- Vocabtron: a game to test your vocabulary in English, French, or Japanese