Graham Neubig
neubig at ar.media.kyoto-u.ac.jp
I am currently a doctoral candidate at the Kyoto University Graduate School of Informatics, affiliated with the Media Archiving Research Laboratory. Some of my research interests include:
- Unsupervised Language Learning and Modeling
- Statistical Machine Translation
- Active Learning and Domain Adaptation
- Processing for Noisy Input (Speech, OCR, etc.)
Academic/Career History
- 8/2001-5/2005 University of Illinois, Urbana-Champaign: B.S. Computer Science
- 9/2005-7/2006 Tajima Agricultural High School: Assistant Language Teacher
- 8/2006-3/2008 Hyogo Prefectural Government: Coordinator for International Relations
- 4/2008-3/2010 Kyoto University: Master's course in Intelligent Information Systems
- 4/2010-? Kyoto University: Doctoral course in Intelligent Information Systems
Awards
- 2011: 2011 Yamashita SIG Research Award
- 2011: 2011 Meeting of the Association for Natural Language Processing: Outstanding Presentation Award
- 2010: JSPS Research Fellowship for Young Scientists (DC1)
Internships
- 2010-Present: NICT, Japan. Focus on unsupervised learning for machine translation. Host: Taro Watanabe
- 2011 (Jul.-Sep.): Google, Mountain View, USA. Research on machine translation. Host: David Talbot
- 2008 (Aug./Sep.): ATR, Japan. Researched language models for machine translation. Host: Andrew Finch
Activities
- 2012: NLP2012, Organizer for the Theme Session on Language Information Processing in Times of Crisis
- 2011: NLP2011, Program Committee for the Workshop on the Relationship between Business, Universities, and Students in Natural Language Processing
- Reviewer: ACL, EACL, IEEE ASLP.
Teaching
- 2010: TA for "Media Information Processing." I prepared material for the class on language models and speech recognition.
- Slides for tutorials and classes can be found on my teaching page
Software/Resources I've Developed
pialign: A phrasal aligner for statistical machine translation that is able to produce compact, competitive phrase tables in a single step with no heuristics.
The Kyoto Free Translation Task: An evaluation task for machine translation with publicly available data. The target is Wikipedia articles about Kyoto.
latticelm: A tool for non-parametric bayesian unsupervised word-segmentation and language model learning using lattices. Lattices allow for learning over noisy input such as phoneme recognition results from continuous speech.
KyTea ("KYoto Text Analysis toolkit"): A toolkit for text analysis including word (morpheme) segmentation and pronunciation estimation. It can be learned from partially annotated corpora, allowing for rapid domain adaptation.
Kylm ("KYoto Language Modeling toolkit"): A language modeling toolkit written in Java. It currently is able to train n-gram models with a variety of smoothing techniques.
Kyfd ("KYoto Fst Decoder"): A beam-search decoder for FST models written in C++. It features the ability to keep track of separate component weights for log-linear tuning, use hierarchical failure transitions, and handle lattice input.
More can be found on my software page.
Research Papers
Journal Papers
- Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara.
Bayesian Learning of a Language Model from Continuous Speech (BibTex)
IEICE Transactions on Information and Systems.
February 2012. - Graham Neubig, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, Tatsuya Kawahara.
Joint Phrase Alignment and Extraction for Statistical Machine Translation (BibTex)
Journal of Information Processing.
February 2012.
Conference Papers
- Andrew Finch, Chooi-ling Goh, Graham Neubig, Eiichiro Sumita.
The NICT Translation System for IWSLT 2011 (BibTex)
Proceedings of the International Workshop on Spoken Language Translation 2011. San Francisco.
December 2011. - Graham Neubig, Yuichiroh Matsubayashi, Masato Hagiwara, Koji Murakami.
Safety Information Mining — What can NLP do in a disaster — (BibTex)
Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP). Chiang Mai, Thailand.
November 2011. - Daniel Flannery, Yusuke Miyao, Graham Neubig, Shinsuke Mori.
Training Dependency Parsers from Partially Annotated Corpora (BibTex)
Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP). Chiang Mai, Thailand.
November 2011. - Masao Utiyama, Graham Neubig, Takeshi Onishi, Eiichiro Sumita.
Searching Translation Memories for Paraphrases (BibTex)
Proceedings of the Machine Translation Summit XIII. Xiamen, China.
September 2011. - Shinsuke Mori, Graham Neubig.
A Pointwise Approach to Pronunciation Estimation for a TTS Front-end (BibTex, Code)
12th Annual Conference of the International Speech Communication Association (InterSpeech 2011). Florence, Italy.
August 2011. - Graham Neubig, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, Tatsuya Kawahara.
An Unsupervised Model for Joint Phrase Alignment and Extraction (BibTex, Code, Slides)
The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). Portland, Oregon, USA.
June 2011. - Graham Neubig, Yosuke Nakata, Shinsuke Mori.
Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis (BibTex, Errata, Code, Slides)
The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT) Short Paper Track. Portland, Oregon, USA.
June 2011. - Mijit Ablimit, Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara, Askar Hamdulla.
Uyghur Morpheme-based Language Models and ASR (BibTex)
IEEE 10th International Conference on Signal Processing (ICSP). Beijing.
October 2010. - Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara.
Learning a Language Model from Continuous Speech (BibTex, Code, Slides)
11th Annual Conference of the International Speech Communication Association (InterSpeech 2010). Makuhari, Japan.
September 2010. - Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara.
Semi-automated Update of Automatic Transcription System for the Japanese National Congress (BibTex)
11th Annual Conference of the International Speech Communication Association (InterSpeech 2010). Makuhari, Japan.
September 2010. - Graham Neubig, Shinsuke Mori.
Word-based Partial Annotation for Efficient Corpus Construction (BibTex, Code, Slides)
The seventh international conference on Language Resources and Evaluation (LREC 2010). Malta.
May 2010. - Graham Neubig, Yuya Akita, Shinsuke Mori, Tatsuya Kawahara.
Improved Statistical Models for SMT-Based Speaking Style Transformation (BibTex, Slides)
2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010). Dallas, Texas, USA.
March 2010. - Graham Neubig, Shinsuke Mori, Tatsuya Kawahara.
A WFST-based Log-linear Framework for Speaking-style Transformation (BibTex, Code, Slides)
10th Annual Conference of the International Speech Communication Association (InterSpeech 2009). Brighton, UK.
September 2009.
I have also written (or co-authored) 19 papers in Japanese. See the Japanese page for details.
Other Links
- Tools for Natural Language Processing
- Vocabtron: a game to test your vocabulary in English, French, or Japanese