Data Mining and Business Analytics in Knowledge Services

Location

  • UCSC Main Campus: Baskin Engineering 156
  • UCSC Silicon Valley Center: Room 303

Time

  • Lecture: Thursdays 6:00 p.m. - 9:30 p.m.
  • Office Hours: 5-6 p.m. (Tentative; by appointment)

Instructor

Introduction

The intent of the course is to focus on the use of statistical data mining and data analytics in several knowledge service areas of business management, describe the critical challenges and issues, and develop fundamental techniques and methods to solve these problems. 

This introductory-level course will develop the fundamental statistical and machine learning models and techniques for data mining and business analytics progressively, in the context of real world applications and examples. We will develop the techniques systematically and sequentially, but will move back-and-forth the different real world applications and domain areas. This course provides real world applicable methods including modern non-linear methods such as Decision Trees, Boosting, Bagging and Support Vector Machines as well as more classical linear approaches such as Logistic Regression, Linear Discriminant Analysis, K-Means Clustering and Nearest Neighbors. We will cover all of these approaches in the context of Marketing, Finance and other important business decisions. At the end of this course you should have a basic understanding of how all of these methods work and be able to apply them in real business situations that arise in Silicon Valley firms.

This course is a data, project, and R-based Data Mining (stand alone course and/or) companion of CMPS 242 Machine Learning. The subsequent courses, such as TIM 250 and TIM 251, will expand on the techniques and domain areas such as web mining, computational advertising and online marketing, social networks and relational learning, reinforcement learning, constrained optimization, and also explore significant projects, including possibly with industry.

TextbookAn Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). The pdf for this book is available for free, with the consent of the publisher, on the book website.

Prerequisites: Students are expected to be mathematically mature, and to have had prior exposure to undergraduate linear algebra at the level of  MATH21 or  AMS10  and probability/statistics at the level of AMS 131 or MPE 107. We will provide a refresher in the form of a “boot camp” early in the course, to enable students to relearn basics required for the course.

Grading

Grading will be based on the following weighting scheme:

  • Assignments: 20%
  • Midterm: 15%
  • Course project: 50% (proposal and report 40% + presentation 10%)
  • Final exam: 15%

Instructors and Assistants