Monday, September 7, 2009

Activity 15 - Probabilistic Classification


In the previous activity, we tried to classify objects into classes based on a set of features that describe the objects. Particularly, we used the Euclidean distance as a classifier for our dataset. In this activity, on the other hand, we use another classifier called the Linear Discriminant Analysis (LDA). Discriminant analysis is a statistical approach in pattern recognition. LDA assumes that the classes of objects are linearly separable (as the term linear applies)-- meaning, the classes of objects can be separated by linear combination of features that describe the object[1]. The LDA formula is given by:

fi = uiC-1xkT - 1/2uiC-1uiT + ln(pi) .

where ui is the mean features of the class i, C is the covariance matrix obtained from the training set, xk is the features of test kth test object, and pi is the prior probability of a class i.

Mean Features of a Class
Given features of a set of test objects of the classes 1 and 2, (in this case, we have 3 test objects and 2 features for class 1)

x1 = [ao bo; a1 b1; a2 b2] and x2 = [co do; c1 d1]

mean feature of class 1 is calculated as:

u1 = [mean(ao,a1,a2) mean(bo,b1,b2)]

Covariance Matrix
From the features of the whole training set (contains all training samples of all classes), one can calculate the global mean vector. This is essentially the mean features of the whole training set.
For example, given a data set

x = [ao bo; a1 b1; a2 b2; co do; c1 d2] ,

the global mean vector is given by

u = [mean(ao,a1,a2,co,c1) mean(bo,b1,b2,do,d1)]

From the u, we can solve the mean corrected data xio for each class i

xio = xi -u .

The covariance matrix of a class i is:

ci = [(xio)Txio]/ni

where ni is the number of test samples used for class i.
The covariance matrix of the whole test set is then solved using:

C = 1/n Σ nici

where n is the number of samples in the whole data set.

Prior Probability
The prior probability of class i is given by:

pi = ni/n .


Results

We used the LDA formula in classifying objects in our data set. The object k is classified as belonging to class i if it resulted the highest f at i. Below is the result of our classification. Highlighted values are the maximum fi obtained per sample. Samples are classified according to i where highlighted values occur.

Table 1. LDA results of 5 test samples of 1-peso coins,25-cent coins, long leaves, short leaves and flowers.


Based on the table, we were able to get a 92% classification of objects.

In this activity, I give myself a grade of 10 for understanding ang implementing properly LDA.

I thank Mark Jayson and Luis for happy conversations while doing the activity in the IPL lab.

References:
[1]http://people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html

No comments:

Post a Comment