Data Science Blog: Support Vector Machines in MATLAB I

Handwritten Digit Recognition

Data: downloaded from http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Description:
Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been deslanted and size normalized, resulting in 16 x 16 grayscale images (Le Cun et al., 1990). The data are in two gzipped files, and each line consists of the digit id (0-9) followed by the 256 grayscale values. There are 7291 training observations and 2007 test observations, distributed as follows:
              0       1      2       3      4        5       6       7        8       9        Total
Train 1194   1005   731   658   652 556    664    645   542    644      7291
Test    359    264 198 166   200 160   170    147   166    177     2007
or as proportions:
              0            1       2       3         4        5         6        7         8        9
Train    0.16     0.14    0.1    0.09    0.09   0.08    0.09    0.09    0.07    0.09
Test    0.18      0.13    0.1    0.08    0.10   0.08    0.08    0.07    0.08    0.09

Alternatively, the training data are available as separate files per digit (and hence without the digit identifier in each row). The test set is notoriously "difficult", and a 2.5% error rate is excellent. These data were kindly made available by the neural network group at AT&T research labs (thanks to Yann Le Cunn).

Code:

zip1 = 5; zip2 = 8;
trdat = dlmread('zip.train');
ind = trdat(:,1)==zip1 | trdat(:,1)==zip2;
X = trdat(ind,2:end);
Y = trdat(ind,1);
[n,p] = size(X);
tsdat = dlmread('zip.test');
ind = tsdat(:,1)==zip1 | tsdat(:,1)==zip2;
Xs = tsdat(ind,2:end);
Ys = tsdat(ind,1);
fitsvm = svmtrain(X,Y,'boxconstraint',1);
class = svmclassify(fitsvm,X);
trerr = sum(class~=Y)/length(Y);
class = svmclassify(fitsvm,Xs);
tserr = sum(class~=Ys)/length(Ys);
[trerr tserr]
ans =
         0    0.0491
colormap(gray)
for i=1:9
    subplot(3,3,i)
    imagesc(flipud(rot90(reshape(fitsvm.SupportVectors(i,:),16,16))))
    axis off
end

[classNB,errNB] = classify(Xs,X,Y,'diaglinear');
[classLDA,errLDA] = classify(Xs,X,Y,'linear');
[errNB errLDA tserr]
ans =
0.0555 0.0027 0.0491

Data Science Blog

Monday, March 4, 2013

Support Vector Machines in MATLAB I

No comments:

Post a Comment