Parallel Corpus of Bilingual Emphasized Utterances

This corpus is a parallel corpus of spoken strings of digits in English and Japanese spoken by a bilingual speaker. Each string of digits has one of the digits emphasized, and thus is useful for research that is interested in translation of paralinguistic information, particularly emphasis or speaker traits. The data is split into standard training and test sets of 445 and 55 utterances respectively. The corpus is free to use for research purposes, but should not be used for commercial systems or redistributed.

Parallel Corpus of Bilingual Emphasized Utterances (1.0)

The corpus was developed mainly by Takatomo Kano, Sakriani Sakti, and Graham Neubig at the Nara Institute of Science and Technology. If you would like more details about the collection of the corpus, or would like to cite it in your research, please reference:

A Method for Translation of Paralinguistic Information
Takatomo Kano, Sakriani Sakti, Shinnosuke Takamichi, Graham Neubig, Tomoki Toda, Satoshi Nakamura
International Workshop on Spoken Language Translation (IWSLT 2012)

If you have any questions about the corpus, please feel free to ask Graham by contacting neubig at gmail dot com.