I am currently a masters student at the Kyoto University Graduate School of Informatics, affiliated with the Media Archiving Research Laboratory run by Professor Tatsuya Kawahara. Some of my research interests include:
- Statistical Language Modeling
- Weighted Finite State Transducers (WFSTs)
- Error Correction for Noisy Input (Speech, OCR, etc.)
- Statistical Machine Translation
- Unsupervised Language Acquisition
Academic/Career History
- 8/2001-5/2005 University of Illinois, Urbana-Champaign: B.S. Computer Science
- 9/2005-7/2006 Tajima Agricultural High School: Assistant Language Teacher
- 8/2006-3/2008 Hyogo Prefectural Government: Coordinator for International Relations
- 4/2008-3/2010 Kyoto University: Master's course in Intelligent Information Systems
- 4/2010-? Kyoto University: Doctoral course in Intelligent Information Systems
Software I've Developed
Kylm ("KYoto Language Modeling toolkit"): A language modeling toolkit written in Java. It currently is able to train n-gram models with a variety of smoothing techniques. Eventually it will have the ability to perform detailed comparisons of a number of different types of language models, and simply model unknown words using sub-word structure (characters).
Kyfd ("KYoto Fst Decoder"): A beam-search decoder for FST models written in C++. It features the ability to keep track of separate component weights for log-linear tuning, use hierarchical failure transitions, and handle lattice input.
KyTea ("KYoto Text Analysis toolkit"): A toolkit for text analysis including word (morpheme) segmentation and pronunciation estimation.
Links
- Tools for Natural Language Processing
- Vocabtron: a game to test your vocabulary in English, French, or Japanese
Research Papers
Conference Papers
- Graham Neubig, Yuya Akita, Shinsuke Mori, Tatsuya Kawahara.
Improved Statistical Models for SMT-Based Speaking Style Transformation (BibTex)
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010). Dallas, Texas, USA.
March 2010. - Graham Neubig, Shinsuke Mori, Tatsuya Kawahara.
A WFST-based Log-linear Framework for Speaking-style Transformation (BibTex)
10th Annual Conference of the International Speech Communication Association (InterSpeech 2009). Brighton, UK.
September 2009.
For a list of papers in Japanese see the Japanese page.