Large-Scale Web Analytics and Machine Learning

Large-Scale Web Analytics and Machine Learning


Dr James G. Shanahan

Location Campus (JB 156); SVC (Room 303)

6:00PM-9:00PM Tuesdays

From April 1, 2014 to June 3, 2014

with a final exam during week 11 of the quarter (Week of June 8, 2014)

Recorded Lectures

Click here 

Instructor Office Hours:

By Appointment only

James.Shanahan_AT_ and

In your emails please use the following subject line format otherwise responses might be late or overlooked

"TIM 251 Winter 2014: topic of email" E.g., "TIM 251 Winter 2014: location of final exam"



Welcome to TIM 251 (formerly known as ISM 251). TIM 251 is scheduled for Thursday  6-9:30pm (First class on Tuesday, April 1, 2014). 

Have you ever wondered what is BIG data science or about the following big data analytics problems:

  • How a websearch engine indexes 100 Billion pages?
  • How Facebook recommends news items for 1 Billion people?

  • How social media platforms recognizes people faces? 
  • How online advertising (a $130 billion industry worldwide) works?
  • How to automatically fly a helicopter upside down?
  • How does Facebook suggest new friends? Which items get posted to your wall?


This course is for people interested in automatically extracting knowledge from petabytes of data (equivalent of 1,000 laptop harddrives). Students should have some prior knowledge or experience with basic machine learning methods.

You must have taken a machine learning course at the undergraduate or graduate level prior to taking this course, or have industry experience with machine learning.

The following are skills that a student should possess. Having said that we will cover most of these core concepts in a refresher manner and build on them in terms of theory and application at scale:

  • knowledge of basic methods in machine learning such as linear classifiers, logistic regression, K-Means clustering, and principal components analysis.
  • although much of the assignments will use dynamic/scripting programming languages, some proficiency in, R and python programming will be assumed
  • knowledge of basic concepts in probability and statistics: probability distributions and probability density functions, conditional probabilities, marginalization, Bayes' theorem
  • basic knowledge of linear algebra and multivariate calculus: linear system solving, eigenvalues/eigenvectors, least square minimization, gradient, Jacobian, and Hessian.
The goal of the course is to train graduate students and industry personnel in advanced techniques for web mining, including advanced machine/reinforcement learning. After the course, students will develop capabilities in web mining with applications in online marketing and click stream data analytics, advanced (vertical) search engine design and analysis, Bayesian network inference and smart alerts with applications including in blog mining and health.
One important focus of this course is to develop theoretical practical problem solving skills based on state-of-the-art theory, algorithms and systems in information and web sciences [Cloud computing, Hadoop, R] to tackle real world, web-scale problems [web-search ranking, social networks, log-file processing, online advertising].This will position students for success in both major firms such as Yahoo, Google, Microsoft, Amazon, enterprise firms such as IBM, HP, Cisco, and Wal-Mart, as well as leading edge startups in data/business analytics and smart search.Theory will be supplemented with hands-on examples and, in addition, course projects will focus on applying developed skill sets to real world datasets primarily on Hadoop Clusters (made available through the Amazon EC2 cloud).
The course format will be mainly lectures, and will be supplemented with external and internal speakers including industry personnel, faculty, and students. Significant time will be devoted to project modeling and analysis. The key output of the course will be the development of research papers using public or other data to solve key industrial problems.
The objective of this course is to go deep on a number of representative techniques in each of the above areas, some of which are workhorses of industry such as online advertising and healthcare. Each lecture will be a composition of theory, geometry and code (R primarily), and example problems. You will end up coding up many of the algorithms that we cover (as homework and as class projects).

You will learn some of the following skills:

  • Learn the core subjects of optimization theory: gradient descent; convex optimization. See how they are used every day in machine learning and in online advertising.
  • Learn core workhorse techniques in supervised machine learning and in unsupervised machine learning that used every day in online advertising, marketing, and healthcare
  • Analyze intelligent support systems for marketing decisions as well as develop mathematical models for optimizing sales, marketing, and pricing decisions in high tech
  • Stochastic Recommenders: review classical approaches to collaborative filtering while also looking at recent developments in the field of stochastic recommenders with applications to ecommerce and online advertising
  • Learn basics of graph mining
  • Learn basics of dynamic programming and Markov decision processes(MDP). Look at applications setting policies for online adverting to optimize various business objectives. Time permitting!

The course emphasis will be tuned to the class composition and interest. 

Recommended Prerequisites

  • CMPE 107 or AMS 131 or permission of instructor.
  • Enrollment restricted to graduate students.
  • Applied Mathematics and Statistics 203, 205, 230, and TIM courses 209, 250 recommended.

Course Logistics

The UCSC campus classroom is Engineering 2, room 156. 
Directions to UCSC campus:

It will be telecast between the new UCSC Silicon Valley location, 2505 Augustine Drive Santa Clara (Conference Room 303). Yahoo Maps can be found here

UCSC (Engineering 2, room 156); UCSC Silicon Valley (Room 303)


Instructors and Assistants