CMPS245, Spring 2013, Section 01: Syllabus


IMDB oral narrativejane austen

Course Information

Computational Models of Discourse and Dialogue
CMPS/LING/PSYCH 245 - Spring 2013
Tues-Thurs 12:00 to 1:45
Natural Sciences Annex 103

Final Project Report due at 11 PM on Tuesday June 11th. Submit via Ecommons.

Instructor Information

Prof. Marilyn Walker
Jack Baskin School of Engineering, Room 267
email: maw @ soe [dot] ucsc [dot] edu
Office Hours: Tuesday 2:00 to 3:30. E2 267

Dr. Reid Swanson
Jack Baskin School of Engineering, Room 265
email: reid @ soe [dot] ucsc [dot] edu


Course Description

Spring 2013. The focus of this class is dialogue interaction and dialogue models for narrative and interactive stories. We will examine theoretical and compuational models of dialogue and read papers describing recent work on conversational agents in the context of representations used for task oriented dialogue applications and interactive stories and games.  We will also examine theories of narrative structure and computational models of narrative with a particular focus on the role of dialogue in narrative and storytelling. We will draw the readings and methods discussed in the class from computational, linguistic and psychological sources.

Projects will consist of analysis and modeling of a particular dialogue phenomenon chosen by the project team. For dialogue data sources, we have available: (1)  IMSDb film screen plays corpus,  (2) Web Blogs of Personal Narratives, (3) Jane Austen Novels, or (4)  students can propose to use their own dataset. In the past student projects have ranged from collecting dialogue data and annotating it,  to taking data already available and conducting machine learning experiments with it using the Weka toolkit or other existing tools.

The reading list and discussions and possible projects are constructed to move research forward in this novel area of research on dialogue in interactive stories and games, and research on the role of dialogue in narrative structure more generally. Here are some sample research questions projects could address:

  • How does the writer of an interactive story or any kind of story for that matter, decide what aspects of the plot structure should be revealed in dialogue vs. in third person narrative or other means?
  • What kinds of representations of narrative structure are needed to support automatic generation of character dialogue?
  • Is it possible to develop an automatic computational analysis of the interaction between dialogue and scene description in film screen plays to determine how they work together to move the story along, to convey character emotion, or other key aspects of the story?
  • What are the computational representations of dialogue currently used in interactive stories and what are their weaknesses? How can we make them better?
  • Can we easily distinguish between ''background information'' or what Labov calls "Orientation" in different types of stories, e.g. in the IMSDB films corpus, in Aesop's Fables, in Gordon's weblog corpus?
  • In weblog stories, how is reported dialogue used and when is it used? 
  • Can we use weblog stories to construct models of narrative structure for different types of events?
  • How does Pride and Prejudice character Elizabeth's language in dialogue differ  when she is talking to her sisters vs. talking to Darcy? Can we use NLP tools such as LIWC lexical tagging or other ways of measuring language to quantify whether there is a difference and what it is?
  • Is it possible to use tools like Perceptual Markup Language with an interactive agent to program appropriate dialogue behaviors?
  • Can we use the NLDS Personage expressive natural language generator to generate good dialogue for interactive stories that could increase author creativity? What extensions to the Personage engine would be useful or needed?
  • How do people learn from interactive story systems? How can we make it easier to construct such systems? What kinds of models from natural language processing are useful?

The class is a combination of readings, homeworks, presentations and projects. The  final project report is due on the scheduled day of the final exam. Class presentations of final projects will be during the last week of class. 


Paper Presentation Schedule

Please sign up for your presentations. You must sign up for at least two presentations, one before April 30th and the other before the end of May. Each presentation slot could have up to three people who will work together to organize our discussion of the relevant paper. The presenters need to take the initiative and coordinate with the other presenters for that day to avoid miscommunication about which papers you're presenting and who is doing what. Presentations should use formal slides prepared with powerpoint or other slide presentation and be of a high standard to elicit class discussion.  You may use or incorporate the authors' slides if you can find them. In many cases they are out there or you can write the authors and ask them for the presentation they used at the conference to present the paper, and then modify the focus of the presentation to fit better with the goals of this class.  Consider looking over references or works which cite the paper(s) you are presenting. Please don't wait until the day before to prepare your presentations, it is unlikely to be of a high standard if you leave it until the last minute. Paper presentations count for 25% of your grade.

Reading Responses

Please use form in list at bottom of page. LINK HERE. Reading responses are due immediately before class from 4/4 onward. Use ecommons for turning them in.

Schedule with reading assignments

  • Week 1. Dialogue and Narrative Corpora.  Computational Dialogue Structure and Dialogue Models.
    • Tuesday 4/2 (Prof. Walker presenter)
      • Outline of the class. Goals for the quarter.
      • Methods & Data. 
      • Background Reading: Jurafsky & Martin. Speech and Language Processing Textbook. Chapter 24. Dialogue and Conversational Agents.
      • An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style. Marilyn Walker, Grace Lin, and Jennifer Sawyer. LREC 2012 -
      • IMSDb: Internet Movie Script Database. Cross-reference to IMDB (Internet Movie Database) for additional information.
      • HOMEWORK 1: DUE Tuesday 4/9. Find 3 examples of scenes in a film from the IMSDB corpus that include both scene descriptions and dialogue, that are cases where you think that the interaction between dialogue and scene are interesting. For example, the 'interesting interactions' would arise from trying to model the character's emotions, or because  they induce some kind of inference about character or plot, or  cases where it seems that the plot depends on contextual and emotional interactions that are captured only by the relationship  beween the scene descriptions and what is said the dialogue. Write up your three selected scenes in a format that can be used to support discussion in class next Tuesday when the homework is due, (i.e. you could use it to present to the class using the projector). Describe why you think the scenes are interesting from the perspective of trying to computationally model what is going on in them. Write two paragraphs describing how it might be possible to computationally model this interaction in such a way as to support an interactive story, i.e. one of the participants in the dialogue would be a computational agent and one of the participants would be a human. Turn this in on Ecommons.
    • Thursday 4/4 (presenters need to sign up for each of the two papers)


  • Week 2. Narrative Structure I. Computational Models of Dialogue.
    • Tuesday 4/9
    • Discussion of Homework 1. What did we learn, what was interesting?
    • Presenters to sign up for L&W paper
      • Labov, Waletzky. Narrative Analysis: Oral Versions of Personal Experience. Journal of Narrative and Life History. 1967
      • HOMEWORK 2: DUE Tuesday 4/16. THIS HAS TWO PARTS.
      • PART 1: Using the Homework 2 file, and the annotation guidelines from Avril Thorne's (Psych) lab, label the dialogue excerpt in Homework 2 to practice coding data using using Labov & Waletzky's theory of narrative analysis. Useful Sources: HERE and HERE. This uses one utterance from interactive dialogue conversation (courtesy of Avril Thorne) which is in the Homework 2 file at the end of this syllabus. This is from a true story about Avril's sister's parakeet Chuck and her bad cat, Frank.
      • PART 2. Having completed PART 1 to get warmed up, choose a segment of a story that interests you from the weblog corpus of "personal narratives" and annotate it using Labov & Waletzky's theory of narrative analysis. In order to do this properly you need a story that  consists of at least 3 events, with emotional reactions to those events and other components (orientation (scene, setting),  coda)  of L&W's theory.  Write up your  selected story and your analysis in a format that can be used to support discussion in class next Tuesday (i.e. you could use it to present to the class using the projector) when the homework is due. Describe why you think the scenes are interesting from the perspective of trying to computationally model what is going on in them. Write two paragraphs describing how it might be possible to computationally model the story structure  in such a way as to support an interactive story, i.e. what kinds of choices might the human participant have if the system was the storyteller. What could the system do, if the human was the story teller? Turn this in on Ecommons.


    • Thursday 4/11. Guest Speaker Prof. María InésTorres. University of Bilbao, Spain. Statistical Spoken Dialogue Systems .
    • THIS IS NOT IN REGULAR CLASSROOM BUT IN E2 475 down at the end of the hall on the 4th floor of E2 so that other people can come if they want. I DO expect you there.
      • INVITED TALK: Statistical Spoken Dialog Systems and the Interactive Pattern Recognition framework.
        Professor María Inés Torres. University of Bilbao.  
      • Abstract: Interacting with machines has proved to help many human activities. But machines can also take advantage of the human feedback to improve their performances. In this context the new Interactive Pattern Recognition (IPR) framework has been recently proposed. This proposal lets a human to interact with a Pattern Recognition (PR) system allowing the system to learn from the interaction as well as adapt it to the human behavior. This talk addresses the design of a Spoken Dialog Systems under the IPR framework. We first briefly introduce the IPR paradigm and its main challenges. We then turn to a new formulation to present SDS as an IPR problem, which includes some extensions to the IPR approach as well as a user model definition. Its relationship with classical statistical SDS based on Markov Decision Processes is also explored. We finally apply the proposed formulation to compose a graphical model that has been preliminary evaluated by our group in a dialogs generation task on Dihana and LetsGo corpora.

        Bio: María Inés Torres received her PhD in Physics from the University of the Basque Country in 1990, including an internship at the Centre National d’Études des Télécommunications in Lanion (France) in 1988. She was also a visiting researcher at the Polytechnic University of Valencia (Spain) during the years 1991 and 1992. She has been a member of the board of the Spanish Association of Pattern Recognition and Image, which is a member of the International Association of Pattern Recognition (IAPR), from 1995 to 2008. She is currently a Full Professor of Computer Science at the University of the Basque Country where she founded the Pattern Recognition and Speech Technology research group in 1990, which she has been leading since then, and held several academic management positions. She was also a visiting Faculty at the Language Technologies Institute in Carnegie Mellon University during five months in 2012. She has published numerous papers in journals and international conferences and edited three books.

      • Reading: Jurafsky & Martin. Speech and Language Processing Textbook. Chapter 24. Dialogue and Conversational Agents. Your reading response for 4/11 should be based on Prof. Ines Torres talk and not on the reading itself. BUT I think you will get more out of her talk if you try to read this chapter before you go to the talk.


  • Week 3. Automatic processing of stories using current NLP tools.
    • Tuesday 4/16
      • Stanford Toolkit and how you can use it to process stories/data/dialogue you may want to work with.
      • Part of speech tagging, Named Entity Recognition, Parsing, Pattern matching on parse structures, Coreference resolution
      • The idea of this class and the homework is to familiarize yourself with this commonly used NLP toolchain to see how you might be able to use it in your own research.
      • Homework 3: Apply the Stanford toolkit to some dialogue data and examine the output. Write up your observations. See link in Assignments page. DUE date extended to April 29th. 
    • Thursday 4/18




  • Week 6. More Story structures, with a language generation twist.
    • Tuesday 5/7
    • Project PROPOSALS are DUE today.
      • Building a Bank of Semantically Encoded Narratives. David K. Elson and Kathleen R. McKeown. Describes the Scherezade Story Annotation representation, the Scherezade Tool and how to use it.
      • David K. Elson. 2012. DramaBank: Annotating Agency in Narrative Discourse. IProceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. [PDF
      • We will read these two papers, explore the Scherezade annotation tool in class and by doing HW4, so when David comes on the 16th so we can have all our questions ready!
        • Scheherazade download page, which now includes a Scheherazade Tutorial by David Elson.
        • HOMEWORK 4. DUE Tuesday 5/14. Take a scene from the IMSDB corpus or one of the Aesop's Fables and annotate it using Elson & McKeown's story annotation tool Scherezade.  Examine the representation that Scheherezade produces. Regenerate the story you have annotated from the underlying representation that Scherezade produces. What is different about this representation than what you got with hand-annotation using Labov & Waletzky? Could this representation support an interactive story? How would it have to be extended? Write up your selected story and your analysis in a format that can be used to support discussion in class next Tuesday (i.e. you could use it to present to the class using the projector) when the homework is due. Turn this in on Ecommons.
    • Thursday 5/9. Narrative Structure. Discourse Relations & Causality in Narrative


  • Week 7. More Dialogue and Narrative Representations.
    • Tuesday 5/14
    • Thursday 5/16. David Elson (Scherezade!) will be here. WE WILL MEET IN E2 475!!!
      • David will be giving a talk on his thesis work on story encoding and regeneration using Scherezade and existing off-the-shelf NLP APIs such as Wordnet and Verbnet. You should have already READ three of his papers, and used Scherezade on some story data, so you should be totally ready to ask thoughtful questions.
    • TALK Title: Mapping Out Story Logic: From Social to "Scheherazade"

      Abstract: When we say that an everyday experience reminds us of a story, or feel anticipation for an event that we think will be dramatic, how exactly are we finding an interesting narrative in what would otherwise be just a series of events? Can we build intelligent systems that can speak the language of storytelling, so that they become good at listening and communicating with us? Perhaps one day soon -- but it will take new approaches to representing narrative discourse symbolically. In this talk, I'll discuss our two recent approaches to the problem. The first uses spoken dialog as a key signal for understanding the social networks underlying novels; the second introduces a set of discourse relations which identify a host of dramatic situations that emerge from a text. I'll also discuss our progress toward a collection of linguistically grounded story annotations, and experiments toward one application: finding analogical connections between stories.


    • Week 8. Interactive Story Systems. Recent Work.
      • Tuesday 5/21. 
        • Sali, S., Wardrip-Fruin, N., Dow, S., Mateas, M., Kurniawan, S., Reed, A.A. and Liu, R. 2010. Playing with words: from intuition to evaluation of game dialogue interfaces. Proceedings of the Fifth International Conference on the Foundations of Digital Games (New York, NY, USA, 2010), 179–186.
        • DETAILED discussion of L&W coding from HW2 and difficult cases, how to resolve them, whether they can be resolved.
        • Avril Thorne's notes on using Labov for annotation.
        • Labov1997 paper  with some useful definitions of features of 'narrative clauses'. Reflects on the 30 years of research following the publication of the original article by W. Labov and J. Waletzky on narrative analysis by discussing some further steps in narrative analysis. The presentation is in the form of definitions; implications from those definitions; empirical findings from the study of a larger body of narrative; and theorems, which propose relations with empirical content that are more problematic. Topics discussed include narratives of personal experience, the temporal organization of narrative, temporal types of narrative clauses, structural types of narrative clauses, evaluation, reportability, credibility, causality, the assignment of praise and blame, viewpoint, objectivity, and resolution. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
        • Another Labov paper 1994: Oral narratives of personal experience


  • Week 9
    • Tuesday 5/28.  MEET IN PROJECT TEAMS. OK to use classroom.
    • Thursday 5/30. WORK on your projects. MEET IN PROJECT TEAMS. OKAY TO USE CLASSROOM. 
  • Week 10. Project Presentations.
    We have 9 teams: 4 presentations on Tuesday and 5 on Thursday.

    Tuesday: June 4th:
    Rishes-Lukin 12:10 to 12:30
    Rahimtoroghi 12:30 to 12:50
    Pedelty-Montalvo 12:50 to 1:10
    Parrish-Munishkina-Misra 1:10 to 1:35 (since its a three people project I have allowed 5 minutes more in the slot)

    Thursday: June 6th. Please try to be on time so that we have enough time for all five presentations without going over

    Rubin-Yang (Rubin out of the country on June 4th)  12:00 to 12:20
    Maddaloni-Lebron 12:20 to 12:40
    Hu 12:40 to 1:00
    Harmon 1:00 to 1:20
    Corcoran-Albrecht  1:30 to 1:40

    Project presentations should be roughly structured like this:

    Motivation, Statement of the Problem
    Previous Work
    Description of your Method and Approach. Data, Programs etc.
    Discussion and Future work

    There is a link for submitting these in the assignments in Ecommons. 

    If you do a good job on your presentation, you should be able to convert the structure directly to an ACL style conference paper like the ones we've been reading.

  • Finals Week. Final paper due Tuesday 6/11. There is a link for submitting these in the assignments in Ecommons. 


Seek feedback early if your proposed project differs from the following description: Your project will involve you selecting a dimension of narrative or dialogue or both that you find of interest, such as tools for interactive agents, automatic analysis of plot units, narrative schemas, interaction of dialogue and scene descriptions, etc. You may use either computational methods (supported by Stanford Toolkit or other off-the-shelf tools e.g. Scherezade or the WEKA machine learning toolkit) or standard corpus based methods, using one of the available sources of story structures or your own. The various corpora (Aesop's Fables, IMSDB screen plays, WebBlogs) which are available for your use are linked from the corpus page.

Duplicating existing work from one of the papers that we read would be a clearly acceptable project.  The goal should be something that could lead to a publication with additional work.


The deadline is Tuesday 5/7, but feel free to seek feedback before then! All team members should turn it in on Ecommons and bring one hard copy to class on Tuesday. The project proposal should be around 2 pages and address the following:

  1. Who is involved in this project?
  2. What research issue are you investigating? What problem are you trying to solve? Be specific.
  3. What data will you use?
  4. What methods do you plan to use? Elaborate to the extent you are able.
  5. How will you evaluate your results?
  6. Will you build a functioning system?
  7. What existing tools, if any, do you hope to leverage?
  8. What is the current state of the art for this problem? Cite papers. You should be able to cite at least a couple related papers.
  9. Provide samples/examples or exploratory work.
  10. Is this something you have worked on outside of this class? If so, how much is new?

Project Deliverables

  1. 8 page paper in ACL format presenting the project as a potential submission to the ACL or other relevant conference, complete with bibliography, literature review, quantitative and qualitative results. ACL style files. Google Scholar is great for bibtex entries.
  2. Zip file with data analyzed, source code if such developed, annotations etc (including source for any external libraries)
  3. Readme with instructions for exploring the analysis aspects of your project (including how to run software, if any)
  4. Projects will be presented during the last week of class  ordered in reverse alphabetical order using as a key the name of any team member whose last name is closest to the end of the alphabet. Time allocations will depend on the size and number of teams but we should be able to have 20 minute presentations. Everyone on the team should talk during the presentation.


  • Short Written essays on readings, and use of these reflections in class discussion group participation (10%)
  • Homeworks (5) and discussion of what we learned from the homeworks in class: 25%
  • In-class Presentations (2 or more per person). These must be semi-formal. These should be 20 minute presentations given using a projector with power powerpoint or other similar software. They may include a handout to accompany the lecture if desired. (25%) 
  • Final Project (40%): Divided into Proposal (10%) which is due Tues MAY 7th, Final Presentation and Written Report (30%) due Tuesday June 11th. Midnight.
  • Homework Delivery: Turn it in on Ecommons assignments page using MS Word or PDF formats. Okay to scan.