CMPS140, Winter 2012, Section 01: Lecture 17

Surprise of an outcome with prob P = log2(1/P) bits
Entropy = Diversity = Expected Surprise = summation(Pi*log2(1/Pi)) bits
Diversity is Maximum = log2(N) when the system is random!

A fair die has diveristy = log2(6) bits.

log2(N) - Diversity  gives you the amount of "structure" in a system.

Perceptrons are simple neural nets that can learn linearly separable functions.
They can not learn XOR or parity without help (multilayer or additional features given).

Anything a perceptron can represent it can learn using the
simple perceptron learning algorithm:
  loop through training examples:
      if too low add training example into weights
      if too high subtract training example from weights
  until all examples classified correctly or in a loop.

Gradient descent neural nets use a more sophisticated updating function
for an iterative learner.

Linear regression arrives at the same line - using batch processing.

Kernel Regression add non linear terms to linear regression in order to learn
more complicated (nonlinear) functions.

Nearest neighbor is a machine learning technique based on table lookup against
past examples. If you use K (say 5) previous closest examples and average their results
this is called K-Nearest Neighbor!

There are efficient ways of implementing nearest neighbor called K-D Tries.