# Computational Biology Tools

Winter 2014

Tu/Th  4-5:45 PM   Physical Sciences 114

Instructor

Prf. Todd Lowe   tmjlowe@ucsc.edu

Ph: 9-1511

Office: PSB 316

Office Hours: Wed 2pm-4pm (or by appointment)

TA

Office Hours: Wed  1-2pm, Thur 10-11am   PSB 319

Discussion Sections

A DIS 42858      Wed      09:30AM-10:40AM      Soc Sci 2 141

B DIS 42859      Wed      11:00AM-12:10PM      Soc Sci 2 141

C DIS 42861      Thur      08:30AM-09:40AM       Soc Sci 2 141

Lectures are supplemented weekly with 70 min sections led by the teaching assistant, Chad Townsend (instructor may assist some weeks). Sections provide you with opportunities to explore the course’s material in a more intimate environment, go through assigned problem sets, as well as to dive into hands-on activities.

Course Description

This course provides an introduction to sequence informatics and workflows relevant to high-throughput computational genomics.  In addition to introducing core topics of sequence alignment and database searching (historically, central to BME110), the curriculum this quarter is designed to give students experience with advanced topics and applications of next-generational sequencing technologies, including: genome-wide alignments of experimental datasets (RNA sequencing, DNA for SNP detection, and chromatin immunoprecipitation sequencing (ChIP-seq)), among a variety of biological applications.  Biological concepts and online tools presented in this course are expected to be highly valuable to any student interested in developing skills expected of contemporary computational biologists.

No prior experience with scripting languages or knowledge of unix environment is required.  Class problem sets will utilize the Galaxy web-based platform (http://galaxyproject.org/) -- an online analysis framework common to life science research --  to analyze and integrate publicly available software and genomic datasets.  Additionally, the use of Galaxy is intended to introduce the importance of reproducibility and transparency in data management through the use of published workflows.

The course is open to all science students with basic biochemistry and/or genetics or permission of the instructor as a prerequisite.

Expectations

You must bring your own laptop to class every day.   If you do not have access to a laptop computer that you can use for this class, please contact the instructors as soon as possible.

You are expected to participate, submit weekly problem sets. There will be weekly online quizzes (scored for participation, not performance), a midterm, and a final exam.

Required Text

The class textbook is available from CSHL Press, or Amazon.com (both hardcover ($55-$60) and Kindle versions (\$30)).  I recommend the electronic Kindle version to allow you to use it as both a supplement for topics discussed in class and a quick reference.

#### Next-Generation DNA Sequencing Informatics

Edited by Stuart M. Brown, New York University School of Medicine

http://www.amazon.com/Next-Generation-Sequencing-Informatics-Stuart-Brown-ebook/dp/B00GXFPGMS/

http://www.cshlpress.com/default.tpl?action=full&--eqskudatarq=948&typ2=hpl

http://seqinformatics.com/  (this also has a blog with comments relevant to the text)

1. Introduction to DNA Sequencing

Stuart M. Brown

2. History of Sequencing Informatics

3. Visualization of Next-Generation Sequencing Data

Phillip Ross Smith, Kranti Konganti, and Stuart M. Brown

4. DNA Sequence Alignment

5. Genome Assembly Using Generalized de Bruijn Digraphs

D. Frank Hsu

6. De Novo Assembly of Bacterial Genomes from Short Sequence Reads

Silvia Argimón and Stuart M. Brown

7. Genome Annotation

Steven Shen

8. Using NGS to Detect Sequence Variants

Jinhua Wang, Zuojian Tang, and Stuart M. Brown

9. ChIP-seq

Zuojian Tang, Christina Schweikert, D. Frank Hsu, and Stuart M. Brown

10. RNA Sequencing with NGS

Stuart M. Brown, Jeremy Goecks, and James Taylor

11. Metagenomics

Alexander Alekseyenko and Stuart M. Brown

12 High-Performance Computing in DNA Sequencing Informatics

Efstratios Efstathiadis and Eric R. Peskin

Galaxy Screencast and Vimeo tutorials

http://galaxyproject.org/

http://wiki.galaxyproject.org/Learn

http://wiki.galaxyproject.org/Learn/Screencasts

http://vimeo.com/channels/581769

Additional reading material will be provided throughout the class relative to topics presented that week.

Problem Sets/Homework: 35%

Midterm: 25%

Final Exam: 30%

Participation (includes on-line quizzes and other in-class exercises): 10%

In recent years, there have been an increased number of cheating incidents in many UC campuses, and unfortunately, UCSC is no exception. The School of Engineering has a zero tolerance policy for any incident of academic dishonesty. If cheating occurs, there may be consequences within the context of the course, and in addition, every case of academic dishonesty is referred to the students' college Provost, who then sets the disciplinary process in motion. Cheating in any part of the course may lead to failing the course and suspension or dismissal from the university.

What is cheating? In short, it is presenting someone else's work as your own. Examples would include copying another student's written or electronic homework assignment, or allowing your own work to be copied. Although you may discuss problems with fellow students, your collaboration must be at the level of ideas only. Legitimate collaboration ends when you "lend", "borrow", or "trade" written or electronic solutions to problems, or in any way share in the act of writing or electronically sharing your answers. If you do collaborate (legitimately) or receive help from anyone, you must credit them by placing their name(s) at the top of your paper.

What is Academic Integrity? This question is better answered with how we violate academic integrity. One prime example is fabrication.

Fabrication:

• In any academic exercise, submitting falsified data including bibliographic resources and experimental data, or altering graded coursework/exams and resubmitting to the instructor for a higher score.