CMPS143, Spring 2014, Section 01: Syllabus

NLTK book

Primary Textbook
(available online)
Secondary Textbook
(partially available online)
Additional Resource

Course Information

Introduction to Natural Language Processing
CMPS143 - Spring 2014
Tues-Thurs 2:00 to 3:45
Physical Sciences 136

 

Instructor Information

Prof. Marilyn Walker
Jack Baskin School of Engineering, Room E2-267
email: maw @ soe [dot] ucsc [dot] edu
Office Hours: Wed 2 to 3:30. E2 267, or by appointment

Dr. Reid Swanson
Jack Baskin School of Engineering, Room E2-261
email: reid @ soe [dot] ucsc [dot] edu
Office Hours: Thursday 4:00 - 5:30. E2 261, or by appointment

Teaching Assistants: TWO people sharing one TAship
TA Office Hours: Monday 11:00 to 12:30 E2 387

Zhichao Hu
email: zhu @ soe [dot] ucsc [dot] edu
Stephanie Lukin 
email: slukin @ soe [dot] ucsc [dot] edu
 

Course Description

Spring 2014. This class introduces advanced undergraduates to the theory and practice of Natural Language Processing. This offering will focus on NLP programming for processing and generation of narratively structured text, such as classic stories such as Aesop's Fables as well as personal narratives that can be mined on the web. CMPS 143 provides a combination of homeworks and exams targeted at learning the basics of NLP using the NLTK toolkit and other publicly available software.   

Text book:

  • Natural Language Processing with Python. Available electronically and from the bookstore. Henceforth referred to as NLLP

Auxiliary texts:

  • https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition
  • Speech and Natural Language Processing. Jurafsky and Martin. Coursera online lectures and parts of book available online.

Grading

  • Attendance (5%)
  • Homeworks and discussion of what we learned from the homeworks in class: 45%
  • Midterm 25%
  • Final 25%
  • Homework Delivery: Turn it in on Ecommons assignments. Please include any code, files and written documents in a zip file. Written documents should be plain text or PDF only. Multiple uploads (to overwrite) are enabled. Late HW accepted until noon the next day with a 10% penalty. Homeworks not accepted any later than that because the solutions will be posted at noon and discussed in class.

Schedule: Reading and homework assignments

Week 1. NLP and Basic Text Processing.

April 1st: The NLP Pipeline

  • General applications of NLP
  • The Holy Grail: Story Intention Graph
  • Getting IPython Virtual Machine.
  • Installing NLTK, examples of how to use it
  • Homework 0: Take home quiz on logic, probability, regular expressions
  • Reading: Chapter 2 (reading in natural language data from files) and Chapter 3 (reading in natural language data from the web and cleaning it) of NLPP

April 2nd. Homework 1 uploaded. Due April 7th. 11:55 PM. 10 points

April 3rd: Basic Text Processing. Unigrams and Bigrams

  • Homework 0 due
  • Tokenization (READ ch. 3.1.1)
  • Counting words modeling their frequency (ch. 1)
  • Word categorization & generalization
  • POS tagging & applications (READ ch. 5.1 & 5.2)
  • Regular Expressions (Read ch 3.4, 3.5, 3.6, 3.7)

Week 2: Classification of text using words and POS. 

April 7th. 11:55 PM. Homework 1 DUE

April 8th: Moving beyond Words and POS

  • Review HW1.
  • Review of Probability. Conditional Probability.
  • What can we do with POS categories?
  • NGram Language Models. (READ Chapter 5.4 and 5.5)

April 10th:  Intro Text Classification 

  • Cont N-Gram language models  (READ Chapter 5.4 and 5.5)
  • Collocations
  • Reading NLPP Ch 6.1 the subsections called "Classification"  "Gender identification" "Choosing the right features", and "Document Classification"
  • Classifying Texts or Utterances into Categories.
  • Defining an Experiment.
  • Restaurant Reviews, Thumbs up or Thumbs Down?
  • Constructing Feature Representations of Texts. 
  • Features for POS, features for words (unigrams), Bigram Features.
  • Homework 2 assigned. Due Friday 5:55 PM April 18th. 20 pts.
  • We strongly recommend that you start doing the feature extraction right away before next Monday!

 

Week 3: Natural Language Understanding I 

April 15th: Text Classification II. Sentiment Lexicons, Lexical Resources 

  • READING: Chapter 6, sections 6.1 to 6.4 
  • Defining an Experiment.
  • Sentiment and Subjectivity, as a Classification Problem.
  • Sentiment Lexicons: LIWC Linguistic Inquiry and Word Count. LIWC Features
  • Examining the most important features. Other methods for classifiers in NLTK.

April 17th:  

  • Where  can we use text classification? How many NLU problems can be cast as classification problems?
  • How to figure out what features are useful.
  • How to do error analysis on your classifier predicted output.
  • NLTK Corpora. What is there in the NLTK data set.
  • Mini Homework 3 Assigned. Sentiment Classification++  Due Monday April 21st. 7 pts.
  • Homework 2 due April 18th at 5:55 PM

Week 4: Lexical Meaning and Sentence Representations

Mini Homework 3  Sentiment Classification++  Due Monday April 21st. 7 pts.

April 22nd: Lexical Meaning & Verb Dependency Structures 

  • More on Lexical resources (Read Chapter 2)
  • Lexical Meaning: Wordnet and Verbnet
  • Wordnet (READ ch. 2.5)
  • Synonyms and Synsets.
  • Verbs and their dependents
  • Semantic Relatedness
  • Homework 4 assigned. This provides you with sample problems that will allow you to review for the midterm. Includes a sample annotation with Scheherezade. Due Monday 11:59 PM April 29th. 20 pts.

April 24th: Intro to Discourse & Narrative Meaning.

  • Story Intention Graph I: Layers of Representation
  • Scheherezade Annotation Tool. Developing the SIG  story representation through annotation.
  • Scheherezade Tutorial, Available Online to do on your own if desired. Aesop's The Fox and Crow.
  • How Scheherezade uses VerbNet and Wordnet.
  • Sample annotation of story blogs.
  • Reading: Chap 3, Elson 2012. Section 3.3 Especially.
  • Reading: Chapter 4, Elson 2012. Scherherezade annotation interface. Also short version available as a conference paper.

 

Week 5:

  • Homework 4 due Monday April 28th at 11:55 PM.

April 29th

  • Review for Midterm

May 1st

  • Midterm (Probability, Conditional Probability, NGram Language models, POS tagging, Stemming, Collocations, Text Classification, Naive Bayes, Wordnet, Verbnet}
  • Multiple Choice. Bring a PINK SCANTRON.

 

Week 6: Natural Language Understanding II

May 6th:  Discourse & Narrative Meaning II. 

  • Where  can we use text classification? How many NLU problems can be cast as classification problems?
  • HW5: Classifying Clauses into L&W.
  • Discourse Relations in Narrative
  • Story Intention Graph II: Relations between Layers of Representation
  • Labov and Waletsky's Theory of Oral Narrative
  • Definitions of L&W clause types
  • Examples with fables
  • Examples with personal narratives
  • Labeling Clauses and Texts
  • Affect Types
  • Feature representations for L&W oral clause classification.
  • How Naive Bayes works.
  • Homework 5 assigned. Friday 11:59 PM May 15th. 10 pts.

May 8th: Chunking, Sentence Structure and Parsing I 

  • Review modules for HW 5.
  • How Naive Bayes works
  • What is Syntax
  • Chunking (Shallow Parsing vs. Parsing). Read Ch. 7.  
  • Information Extraction and Question Answering
  • Named Entity Recognition
  • Relation Extraction

 

Week 7: Building up representations for NLU

Homework 5 DUE. Friday 11:59 PM May 15th. 10 pts.

May 13th: Parsing II and NLU

  • Sentence Structure. READ ch. 8.1-8.3, 8.5
  • Grammars and Parsing
  • Shift Reduce Parsing
  • Dependency Structures and Relations
  • Pattern Matching on Dependency Relations
  • Dependency Trees to Relation Extraction

May 15th: Introduction to Question Answering: Question Answering I

  • Factoid QA
  • Narrative QA vs. News or Common Knowledge
  • Evaluation metrics: Precision, Recall, F-measure
  • Setup of QA task and demo of scoring
  • QA pipeline
  • Constituency and Dependency Tree Readers
  • Baseline QA system using string operations and sentence ranking
  • Homework 6 Assigned.  

Week 8:  Question Answering I & II

Homework 6 DUE. Wednesday 11:59 PM May 21st.

May 20th: Question Answering I, cont

Working with NLU representations for Question Answering II

  • Baseline QA system. String operations, sentence ranking
  • Types of Questions in baseline QA: Who, What, When, Where
  • Identifying likely phrases and sentences
  • Ranking possibly responses
  • Chunking and Parsing: How to search trees
  • Sample Code Stubs for baseline system
  • Maximizing Recall at the expense of Precision

May 22nd: 

Question Answering II: Using Syntax

Homework 7 ASSIGNED. DUE Wednesday 11:59 PM May 28th.

Natural Language Generation: 

  • Introduction to Natural Language Generation

    Guest Lecture Dr. Irene Langkilde-Geary

  • Working with NLU representations.
  • Syntactic Structure and Coordination
  • Prepositional Phrase Attachments
  • Dependency vs. Constituent Structures I
  • Answering Questions from NLU/parsing representations

 

Week 9: Question Answering III: Lexicons & Lexical Semantics 

Homework 7 DUE. Wednesday 11:59 PM May 28th. Okay to start working with a partner if you choose.

May 27th:  Using VerbNet and WordNet

  • Word Sense Disambiguation
  • Words go in Herds
  • Verbnet and Wordnet API, how it works
  • Review NLTK Lexical resources (Read Chapter 2)
  • Lexical Meaning: Wordnet and Verbnet
  • Wordnet (READ ch. 2.5)
  • Synonyms and Synsets.
  • Verbs and their dependents, VerbNet semantic role types
  • HW8: final HW assigned. The final QA competition. Can work with a partner.

 

May 29th: Review of all techniques for QA, types of questions, methods

  • Verbnet and Wordnet API, how it works
  • Word Sense Disambiguation .DICT files provided with HW8.
  • Constituent and Dependency Trees, finding subjects etc
  • Increasing Precision of Answers

 

Week 10: Question Answering Competition

Homework 8 DUE. Tuesday 11:59 PM June 3rd (accepted late only until Wednesday 10:00 AM June 4th)

June 3rd: No Lecture. Special Section. Lyn and Reid will be in class to answer questions.

  • Work on your QA system, due this DAY at 11:59 PM.

June 5th: Question Answering Competition Results

  • STUDENT PRESENTATIONS. TEN MINUTES. SHOW YOUR SYSTEM.
  • OUR ANALYSIS. What's hot and what's not. 

FINAL Thurs: June 12: Will be set up on Ecommons so it can be taken from anywhere. Will include some material from midterm. More emphasis on techniques for NLU since the midterm, e.g. on syntactic and semantic processing.

  •