Pure awesomeness..: Activity 15 - Probabilistic Classification

In the previous activity, we tried to classify objects into classes based on a set of features that describe the objects. Particularly, we used the Euclidean distance as a classifier for our dataset. In this activity, on the other hand, we use another classifier called the Linear Discriminant Analysis (LDA). Discriminant analysis is a statistical approach in pattern recognition. LDA assumes that the classes of objects are linearly separable (as the term linear applies)-- meaning, the classes of objects can be separated by linear combination of features that describe the object[1]. The LDA formula is given by:

f_i = u_iC^-1x_k^T - 1/2u_iC^-1u_i^T + ln(p_i) .

where u_i is the mean features of the class i, C is the covariance matrix obtained from the training set, x_k is the features of test k^th test object, and p_i is the prior probability of a class i.

Mean Features of a Class

Given features of a set of test objects of the classes 1 and 2, (in this case, we have 3 test objects and 2 features for class 1)

x₁ = [a_o b_o; a₁ b₁; a₂ b₂] and x₂ = [c_o d_o; c₁ d₁]

mean feature of class 1 is calculated as:

u₁ = [mean(a_o,a₁,a₂) mean(b_o,b₁,b₂)]

Covariance Matrix
From the features of the whole training set (contains all training samples of all classes), one can calculate the global mean vector. This is essentially the mean features of the whole training set.
For example, given a data set

x = [a_o b_o; a₁ b₁; a₂ b₂; c_o d_o; c₁ d₂] ,

the global mean vector is given by

u = [mean(a_o,a₁,a₂,c_o,c₁) mean(b_o,b₁,b₂,d_o,d₁)]

From the u, we can solve the mean corrected data x_i^o for each class i

x_i^o= x_i -u .

The covariance matrix of a class i is:

c_i = [(x_i^o)^Tx_i^o]/n_i

where n_i is the number of test samples used for class i.
The covariance matrix of the whole test set is then solved using:

C = 1/n Σ n_i⋅c_i

where n is the number of samples in the whole data set.

Prior Probability

The prior probability of class i is given by:

p_i = n_i/n .

Results

We used the LDA formula in classifying objects in our data set. The object k is classified as belonging to class i if it resulted the highest f at i. Below is the result of our classification. Highlighted values are the maximum f_i obtained per sample. Samples are classified according to i where highlighted values occur.

Table 1. LDA results of 5 test samples of 1-peso coins,25-cent coins, long leaves, short leaves and flowers.

Based on the table, we were able to get a 92% classification of objects.

In this activity, I give myself a grade of 10 for understanding ang implementing properly LDA.

I thank Mark Jayson and Luis for happy conversations while doing the activity in the IPL lab.

References:
[1]http://people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html

Pure awesomeness..

Links

Followers

Blog Archive

About Me

Monday, September 7, 2009

Activity 15 - Probabilistic Classification

No comments:

Post a Comment