The aim of the assignment and project is to build basic understanding and advanced implementation skills needed to do build cutting-edge systems or do cutting-edge research using neural networks fro NLP, culminating with a project that demonstrates these abilities through a project.

Read all the instructions on this page carefully
You are responsible for reading these instructions and following them carefully. If you do not, you may be marked down as a result.

Assignment Policies

Working in Teams: There are 4 assignments in the class. Assignment 1 must be done individually, while Assignments 2, 3, and 4 must be done in teams of 2-3 (individual submissions will not be accepted). If you are having trouble finding a group, the instructor and TAs will help you find one after the first assignment.

Submission Information: To submit your assignment you must submit via canvas a zip file containing:

  • your code: This should be in a directory "code" in the top directory unless specified otherwise.
  • system outputs (assignments 1-3): The format will be specified separately for each assignment.
  • a report: This should be named "report.pdf" in the top directory. This is can be up to 7 pages for assignments 1-3, and 9 pages for assignment 4. References are not included in the page count, and it is OK to submit appendices that include supplementary information such as hyperparameter settings or additional output examples, although there is no guarantee that the TAs will read them. Submissions that exceed the page count will be penalized one third grade for each page over (e.g. A to A- or A- to B+).
  • a link to a github repository containing your code: This should be a single line file "github.txt" in the top directory. Your github repository must be viewable to TAs and instructor by the submission deadline. If your repository is private make it accessible to github IDs (ctinray, CoderPat, simpleoier) by the submission deadline. If your repository is not visible to the TAs, your assignment will not be considered complete, so if you are worried please submit well in advance of the deadline so we can confirm the submission is visible. We use this repository to check contributions of all team members.

Late Day Policy: In case there are unforeseen circumstances that don't let you turn in your assignment on time, 3 late days total for assignment 1, and 5 late days total for assignments 2, 3, and 4 will be allowed. Note that other than these late days, we will not be making exceptions and extending deadlines, so please try to be frugal with your late days and use them only if necessary. Assignments that are late beyond the allowed late days will be graded down one third-grade per day late (e.g. A to A- for one day, and A to B+ for two days).

Plagiarism/Code Reuse Policy: All assignments are expected to be conducted under the CMU policy for academic integrity. All rules here apply and violations will be subject to penalty including zero credit on the assignment, failing the course, or other disciplinary measures. In particular, in your implementation:

  • Code or pseudo-code provided by the TAs or instructor may be used freely without restriction.
  • You may not just re-use an existing implementation written by someone else. The implementation should basically be your own.
  • Code written by other students in the class cannot be used (except, obviously, you can share code within your group for assignments 3 and 4).
  • You can use fragments of code that you found online as long as they are limited to a few lines, and you note where you got the code both as a comment in your code and in your report. If you are unsure whether it is allowed, consult with the TAs before turning in the assignment.

If you are doing a similar project for a graded class at CMU (including independent studies or directed research), you must declare so on your report, and note which parts of the project are for 11-737, and which parts are for the other class. Consult with the TA mailing list if you are unsure.

Consulting w/ Instructors/TAs: For assignments and projects, you are free to consult as much as you want, any time you want with the instructors and TAs. That is what we're here for, and in no way is this considered cheating. In fact, if you don't have much experience with NLP previously, it will be helpful to liberally consult with the instructors and TAs to learn about how to do the implementation and finish the assignments. So please do so.

Details of Each Assignment

Assignment 1: Multilingual Sequence Labeling (Assigned 1/20, Due 2/7)

See the assignment 1 page.

Assignment 2: Multilingual Translation (Assigned 2/3, Due 2/25)

See the assignment 2 page.

Assignment 3: Multilingual Speech Recognition (Assigned 3/1, Due 3/21)

See the assignment 3 page.

Assignment 4: Final Project (Due 5/2)

The final project work will be expected to be a novel contribution to knowledge on multilingual language processing. In general, we will accept contributions in several categories, including the following:

  • Proposal of a novel method for multilingual or low-resource language processing that is better (more accurate, computationally efficient, or data efficient) than other methods in the literature. It may be easier to do this as a follow-up on one of the assignments, but you are free to tackle other tasks as well.
  • An extensive comparison of existing methods in the literature for tackling a particular multilingual or low-resource language processing task, analyzing their strengths and weaknesses and when you may expect them to succeed or fail (on multiple datasets).
  • Building a state-of-the-art language tool for a language for which such technology does not currently exist. You should apply methods that you learned in class in a way that is specifically tailored to the language based on its unique linguistic properties, related languages, or data availability.

For all submissions, they should include a report of up to 8 pages describing:

  • a. Background of the task and problems involved therein.
  • b. The choice of methodology, demonstrating internalization of the knowledge obtained in class.
  • c. Experimental setting and results, including strong baselines and proposed improvements for (1) and (3).
  • d. Analysis of the results demonstrating the characteristics of the implemented methods (especially important for (2)).
  • e. Which members of the group contributed in which ways to the implementation.
  • A+: A respectable research contribution that is novel and effective, and could be submitted largely as-is as a paper to an academic conference. All elements above are high quality.
  • A: A respectable contribution that is largely complete and promising, but the description, sophistication of methodology, experiments, or analysis are not as fleshed out and complete as A+ assignments.
  • A-: All required elements are present, but one or more are lacking to some extent.
  • B+ or B: The project is complete, but one or more of the required elements is seriously lacking.
  • B- or below: The project or description thereof is seriously incomplete.

Negative Results: Sometimes experiments don’t work as planned. If you try hard to get positive results but are not successful, you may still get a good grade by clearly describing why you thought your methods would work, and then performing an analysis of why your initial assumptions were incorrect, leading to results that did not match your initial expectations. The bar for paper writing, experimentation, and analysis will be a bit higher in these cases, as we want to make sure that you really made a serious effort.