Translation and Translation Data (2/1/2022)
Lecture: (by Graham Neubig)
- The Practice of Translation
- Machine Translation
- Translation Evaluation Metrics
- Translation Data Sources
- Bi-text Extraction/Filtering
Language in 10: Cantonese
Slides: Multilingual Slides
Discussion:
Use Google translate to back-translate the text via a pivot language, e.g., "English → Spanish → English" or "English → L1 → L2 → English", where L1 and L2 are typologically different from English and from each other.
Compare the original text and its English back-translation, and share your observations. For example, (1) what information got lost in the process of translation? (2) are there translation errors associated with linguistic properties of pivot languages and with linguistic divergences across languages?
Try different pivot languages: can you provide insights about the quality of MT for those language pairs?
References:
- Recommended Reading: 2021 Translation Industry Trends and Stats (Lim 2021)
- Site: OPUS Corpus
- Reference: BLEURT (Sellam et al. 2020)
- Reference: COMET (Rei et al. 2020)
- Reference: PRISM (Thompson and Post 2020)
- Reference: BARTScore (Yuan et al. 2021)
- Reference: WMT Metrics Shared Task (Mathur et al. 2020)