CMPS290S, Fall 2011, Section 01: Syllabus

Syllabus

CMPS 290S, Fall 2011: Archival Storage and Digital Preservation

Course Objectives

This course is a graduate level study of the issues surrounding archival storage and digital preservation.  The course is structured around readings from the current research literature, with a smattering of "classic" papers from the field.  The topics include:

  • Preserving bits for the long term
    • Reliability
    • Security and authoritativeness
    • Storage system design issues
  • Preserving information for the long term
    • Semantics
    • Context
    • Fidelity
  • Managing information in long-term archives
    • Object preservation
    • Search and indexing
  • Social issues in long-term preservation
    • Economics
    • Security and privacy

We may cover additional topics based on the interests of the students and professor.

Preparation

Students are expected to have a good background in computer systems: either a strong undergraduate background in operating systems or a graduate-level course in computer systems.  Because the reading load in the class isn't too heavy, students who need to brush up on the background for a particular paper can do so as needed.  Background material will not be assigned explicitly, but you wlll be expected to have sufficient background to understand each paper.  If you have questions about background, please ask the professor via email or office hours.

Class Information

All class information will be distributed via the class Web pages, including the class schedule and announcments as well as links to papers.  Note that many of the papers are available via the ACM Digital Library or other online repositories.  All of these repositories are freely accessible from the ucsc.edu domain, but may not be free from off-campus.  You can download the papers off-campus using the OCA (Off-Campus Access) proxy, or you can simply download the papers you need while you're on campus.

Readings

A major component of this course will be the in-class discussion of papers on research in operating
systems. Typically, you will need to read one or two papers per class.  The reading list is available online from the course schedule, and all of the papers are available online as links from the entry for a particular class day.  As noted above, you may need to be in the ucsc.edu domain to download the papers for free.

In addition, we will be using readings from the book Preserving Digital Information (Gladney) and from Advanced Digital Preservation (Giaretta).  The books are both available online from Springer, and are free from the ucsc.edu domain.  These readings will be noted in the course schedule, though there won't be links to them individually (e.g., the schedule will say "read chapter X in Gladney").  You should read each assigned reading (paper or book chapter) carefully, taking notes if that helps you organize your thoughts.

Course Requirements

Because this is a seminar, the majority of the course work will involve the readings, class discussions, reading presentations, and the final project.

Reading Summaries

For each reading, you'll need to write a brief summary of the reading, and submit it online by 9 AM on the day of class (3 hours before the start of class).  Late summaries will not be accepted - ”the summaries will be used as a starting point for discussions during that class day. Reading summaries consist of brief answers to the following questions, and three comments or questions you'd like to bring up during class discussion.

  1. What is the problem or issue addressed by the reading, and why is the problem important?
  2. What approaches are discussed in the reading, and how dow they help address the problem or issue?
  3. What other approaches (if any) are mentioned in the reading, and how are the approaches in the reading better than alternatives?
  4. Three or more comments/questions about the reading.

Summaries will be graded on a 2 point scale, with 2 points corresponding to a good summary, 1 corresponding to a minimal summary, and 0 corresponding to a summary not turned in.

In-class discussions

Every student in the class will present about 2 readings during the quarter; the exact number of papers assigned to each student will depend on the number of students in the class. Needless to say, you need to be in class on the that "your" paper is being discussed. While you may look at presentation materials from anywhere you like, including the original authors, you must write your own presentation.  Keep in mind that your presentation goal may be different from that of the original authors, and that you might have a slightly different audience.  One of the goals of the class is to get you experience giving talks, and writing your own talk based on someone else's material is a good way to practice.

You should complete your talk at least 24 hours before you have to give it, which will allow time for you to practice your presentation once.  Your presentation should be about 30-40 minutes long, without questions.  We'll discuss good presentation techniques in the first class.

After the presentation(s), we'll have an in-depth discussion that starts with the reading and goes beyond it.  Our goal is not just to analyze and discuss what's in the reading, but to take the topics and run with them.  We'll use the comments from your summaries to catalyze further discussion.

Everyone is expected to participate in class discussions.  Part of being a researcher is exchanging ideas with others, and this is an excellent opportunity to gain experience.  In most cases, you're on equal footing with everyone else in the class, since the material will be new for most of you.

Final Project

Students in the class must complete a research project in the general area of archival storage and digital preservation, considered broadly.  Both a paper describing the project and a poster presentation will be required. This project should be the results of somewhat original research (strongly preferred) or a strong and detailed survey of prior art in a very focused area. Reporting work done for another course is not
acceptable. If you wish to get an A- (or better) in the class, you must do a research project, which can include "original" research or a project that verifies earlier research reported elsewhere. A student who does a survey will receive a maximum course grade of a B+.

While you're encouraged to use resources available on the Web and elsewhere (see below on how to cite material), I expect you to put a "reasonable" amount of effort into your class project. A project that requires 5 hours of time to compile and run already-existing software isn't much of a project, and will be graded accordingly. Your project should take approximately 60-80 hours over the course of the quarter, including time to read background material, build and run your experiments, and write up your results.

If you want to work with someone else in the class on your project, you may do so with prior approval
(i.e., please see me before doing this). If you work with a partner, the expectations for the scope of your project will be adjusted accordingly.

There will be checkpoints about every two weeks during the quarter to keep you on schedule to complete your project.  The project schedule will be posted in the first week of classes.

Attendance

Class attendance is mandatory. Because this is a graduate class, I expect students to participate actively in class, and that's hard to do if you're not actually there. I won't take attendance at class (except as necessary to make the registrar happy), but you cannot pass if you miss too many classes. If you need to miss a class for a good reason, such as a conference or other research-oriented commitment, please see me in advance if possible.

Grades

Your grades will be determined as follows:

  • Final project: 50%
  • Class participation and summaries: 30%
  • In-class presentations: 20%

You must turn in a final project to pass the class.

Collaboration vs. Cheating

This is a graduate seminar - I expect that you will discuss the material with other students in the class and
perhaps others outside the class. This is encouraged.  However, all paper summaries must be your own. If you turn in a summary written by someone else, you will receive a -2 for that summary. If this happens more than once, you will fail the class and formal cheating procedures will be begun. I'd rather see you miss a summary than copy someone else's.

You're encouraged to use any resources (code, traces, etc.) you can get for your projects, as long as you properly attribute them in your paper. Science is a collaborative enterprise; as Newton said, If I have seen a little further it is by standing on the shoulders of Giants [Wikipedia]. By making use of what others have already done, you can accomplish a great deal more in a quarter than you could otherwise. Science is built on properly crediting those whose work you use; failure to do this will not be tolerated. Improper use of others' work in your project will result in a failing grade in the class, but it's not as bad as improperly using others' work in your own (independent) research, which can earn you (relatively) permanent disgrace. In short, attribute everything! If you're not sure about the ethicality of something, feel free to talk to me before you do something.