Archive for February, 2012

Train / Test Mahout SGD Classifier

Mahout 0.6 is released with bug fixes and new implementations. I experimented mahout for my previous projct , but it was naive bayes classifier which i used.

But now my data set is too short to run with Naive Bayes and also i came across a topic in Mahout In Action MEAP “Choosing an algorithm to train the classifier”, and it seems to select Stochastic gradient descent (SGD).So its clear that SGD is highly advisable for “Small to medium (less than tens of millions of training examples)”.

So i decided to choose SGD because its best fit for my data set.Ok, selected the algorithm then what’s up next.?? Need to dig in to the mahout source to find some hints on how to run SGD.

Yes..Examples are there which points to How to train and test using SGD.

So lets search for the command line options to run SGD.In examples/bin of mahout binary you can see
I went through that file and got an idea about how to train and test SGD in command line.

Found the way how its works, lets start the game.:)

Before that please make sure that you have a sample dataset to train.You can download 20-news group data set .

Input data set is ready ,now we are going to train it using SGD.

You can download the mahout-distribution-0.6 from any of the mirrors. Extract it and cd in to mahout folder.
There is a class TrainNewsGroups in org.apache.mahout.classifier.sgd and it accepts path of the input data set as argument.

./bin/mahout org.apache.mahout.classifier.sgd.TrainNewsGroups /home/sree/Desktop/20news-bydate/20news-bydate-train/

If you forgot to set the JAVA_HOME , an error may occur.Set the JAVA_HOME if needed.

Currently SGD implementation in mahout supports sequential / online / incremental execution methods.It will not run parallel like naive bayes.(which i experimented before)

By default TrainNewsGroups create model files (.model as filetype) in /tmp directory.While training you can see whether these files are created or not.If you can see files with names “news-group-{a number}.model” , then its sure that SGD started training over our news group data set.

Once training completed you can find a set of  .model files are generated in the /tmp directory.You can choose simply “news-group.model” or “news-group-{MAX NUMBER}.model” as model.

If you can completed the training with out any errors ,  model created from the input data set.:) Its time to test it .This part determines how much accuracy we can get on test data using SGD.

To Test the data against a model , we can use TestNewsGroups in org.apache.mahout.classifier.sgd.TestNewsGroups.

TestNewsGroups has two mandatory arguments :
–input : path of the test data
–model : path of model file.

Finally its time to test it.

./bin/mahout org.apache.mahout.classifier.sgd.TestNewsGroups –input /home/sree/Desktop/20news-bydate/20news-bydate-test/ –model /tmp/news-group.model

and you can see a confusion matrix and classified instances.

I got 73.513% , If you need to see confusion matrix just follow this link

I just discussed the accuracy of SGD in mahout mailing list and there were some follow ups.
You can search mahout mailing list archive of Dec 2011 with topic “Mahout SGD / Bayes prediction results over 20newsgroups”.

I think SGD accuracy not satisfactory when compared with Naive Bayes on the same 20-news data set. :( since SVD is best advisable for small / medium sized dataset.Lets hope mahout developers will work on it. 🙂


Installing ImageMagick in Ubuntu 11.04

I am working in a project , which includes some image maniputaion functions.After a quick googling i found ImageMagick is best to satisfy my needs.

I installed ImageMagic using sudo apt-get install imagemagick.

After installation i tested some of the commands provided by imagemagick ( convert , identify). But it is not working and causing an error “No Delegates for this image”.

Then i realised that there may be some dependencies for ImageMagick and  i searched for delegates,

I have tried the wiki page for setting up ImageMagick but failed again 😦

So decided to install it from source.

You’ll need to install a number of dependencies in addition to ImageMagick in order to have a fully functional ImageMagick installation. It’s important that these dependencies are installed before you start configuring and compiling ImageMagick, because the configure script for ImageMagick will disable functionality that isn’t available because of missing dependencies at compile time.

 The list of dependecies which found usefull for my use case are

 sudo apt-get install libjpeg8-dev libpng12-dev libglib2.0-dev libfontconfig1-dev zlib1g-dev libtiff4-dev

 After installing the above dependencies i just started to compile ImageMagick from source.

 You can download the ImageMagick source from any mirrors

ImageMagick-6.7.5-6 is the latest version.

After downloding the source then tar it.

tar xvfz ImageMagick-6.7.5-6

cd ImageMagick-6.7.5-6


If you need to do some advanced configuration then follow this link

sudo make

sudo make install

Installation completed ,Hooray..:) I checked ImageMagick commands and found the same delegate problem again.:( 😦

Compiling should restart from step ./configure and i added a –disable-shared option

./configure –disable-shared

Installtion completed again..No emotion.

I checked whether all the delegates and dependencies configured with ImageMagick properly

You can check it using convert -list configure

🙂 again.Delegates are configured properly.

DELEGATES fontconfig freetype jpeg jng mpeg png x11 zlib

SO time to run some ImageMagick commands.

Started with identify command.

identify a.jpg

its working..WOWWW..

a.jpg JPEG 321×400 321×400+0+0 8-bit DirectClass 28.9KB 0.000u 0:00.000

then i tried convert one image to another format

convert a.jpg a.gif

WOOOWWW again, its working..:) 🙂

So happy emotions again.:) 😀 😛 . ImageMagick set up.

Now i am going to get my hands dirty with ImageMagick. Courtesy : My Guru Jaganadhg , he usually says so , if he started learning new things.