you want to apply further organization (k-d tries) to speed retrieval.

Building Optimal Decision Trees is NP-Hard thus there is no way to

easily build the smallest tree.

So what we do is use a greedy algorithm which attempts to minimize

entropy by choosing the question which is expected to be most informative at at each step. Assuming there are P positive examples and N

negative examples: Entropy = summation(Pi*log2(1/Pi) = E(P,N)

P* P/(P+N) *log2((P+N)/P) + N*N/(P+N) *log2((P+N)/N)

One we choose that attribute we recursively call this algoriithm on the remaining subtrees.