Course Description

CS 11-737
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 1:25-2:45PM, Doherty Hall 1212 or Zoom (See Piazza for zoom link)

Instructors/TAs:

(See Piazza for Office Hours)

Instructors:
Graham Neubig (gneubig@cs.cmu.edu)
Alan Black (awb@cs.cmu.edu)
Shinji Watanabe (shinjiw@cmu.edu)

TAs: (cs11-737-sp2022-tas@cs.cmu.edu)
  Xuankai Chang
  Ting-Rui Chiang
  Athiya Deviyani
  Patrick Fernandes
  Vijay Viswanathan
Questions and Discussion: Ideally in class or through piazza so we can share information with the class, but emailing the TA mailing list and coming to office hours are also encouraged.

Course Description

Students who take this course should be able to develop linguistically motivated solutions to core and applied NLP tasks for any language. This includes understanding and mitigating the difficulties posed by lack of data in low-resourced languages or language varieties, and the necessity to model particular properties of the language of interest such as complex morphology or syntax. The course will introduce modeling solutions to these issues such as multilingual or cross-lingual methods, linguistically informed NLP models, and methods for effectively bootstrapping systems with limited data or human intervention. The project work will involve building an end-to-end NLP pipeline in a language you don’t know.

Pre-requisites: You must have taken an NLP class previously. Some examples include: Undergrad Natural Language Processing Algorithms for NLP Neural Networks for NLP The assignments for the class will be done by creating neural network models, and examples will be provided using PyTorch. If you are not familiar with PyTorch, we suggest you attempt to familiarize yourself using online tutorials (for example Deep Learning for NLP with PyTorch) before starting the class.

Class format: For each class there will be:

  • Reading: Most classes will have associated reading material that we recommend you read before the class to familiarize yourself with the topic.
  • Lecture: The first part of the class with feature a lecture to overview the topic of the day, in which you can ask questions to clarify about any of the material.
  • Language in 10: Groups in the class will make a 10-minute presentation of one of the languages of the world.
  • Discussion: There will be an open-ended discussion in which we will split into small groups and discuss a question regarding the class.

Grading: The assignments will be given a grade of A+ (100), A (96), A- (92), B+ (88), B (85), B- (82), or below. The final grades will be determined based on the weighted average of discussion participation, assignments, and project. Cutoffs for final grades will be approximately 97+ A+, 93+ A, 90+ A-, 87+ B+, 83+ B, 80+ B-, etc., although we reserve some flexibility to change these thresholds slightly.

  • Participation: Worth 15% of the grade. Your lowest 3 participation grades will be dropped.
  • Assignments: There will be 4 assignments (the final one being the project), worth respectively 15%, 20%, 20%, 30% of the grade.
The details of the assignments are elaborated on the assignments page.

Class Format:

Per university guidance, class will be held remotely for the period of January 15-30th, then is currently planned to return to in-person. However we are closely monitoring the COVID situation and will make adjustments accordingly, as well as attempting to make accommodations for those who cannot participate in-person.