Configure Maven with Proxy Settings

I am well familiar with Maven concept and working in it for atleast a couple of years.With that confidence and experience i set it up on a Windows Machine. Actually it was my first try on a Windows machine.

After downloading and setting it up in System and Eclipse IDE , i found a weird issue which i never faced.

I could not be able to run “mvn install” or “mvn build” . The execution failed in the first level itself saying “Connection refused”.
“Connection refused ” what the hell is that ??

After a bit googling and some hints from my collegues i understood that some how my system is protected with a firewall or some network  access restirctions. That is the reason why maven can not be able to connect to a remote server.

In order to get rid of this situvation we need to use “Proxy Settings” with maven as well as Eclipse.

Yea , understood the problem space and got the solution . Its time to work it out and get it done.

I assume that you already installed and set up JAVA_HOME,MAVEN_HOME in Environment Variables. If not please follow these to install and configure Maven. Okies..Game begins..

Go to your Maven installation folder and you can see a “conf” folder as a first level directory. In that “conf” folder , you can see
a xml file named “settings.xml”.

Open the “settings.xml” file and locate the part where it specifies “<proxies> or “<proxy>”. It looks like

<!-- proxy
 | Specification for one proxy, to be used in
 connecting to the network.

By default this proxy setting is commented. Just un comment it and specify all these details.If you dont know your <host> and <port> , then just take your Internet Explorer –> Internet Options –> Connections –> LAN Settings.

Make the changes in “settings.xml” and save it.Yes..its done in Maven command prompt. Now you can be able to run maven commands  successfully in command prompt with “mvn”.

It does not mean that Maven is working with Eclipse IDE. We should do some thing here in Eclipse IDE to set Maven up.

I assume that you opened Eclipse IDE. Go to Windows –> Preferences –> Maven and select “User Settings”.

You can see a prompt asking for some “User Settings” file. Most probably such a file does not exists. The default location is “.m2” folder. Dont worry , you can create a new xml file with  the following settings.

<settings xmlns=""

Once the file is created then just Browse that file and set it in “User Settings” ( The path is mentioned above). Yea..Its done..

Now you can run Maven commands both in Command Promt as well as in Eclipse IDE. There will not be any connection refused issues any more.

Enjoy coding.. Enjoy Maven..


URL Encoding

In my project i wanted to deal with URL’s which needs some canonicalization.I dont know why these guys are not keeping the standards of url formation 😦
W3C has specific set of regulations over url canonicalization and there are unsafe characters that we must not use in urls.But in real word no one follows these simple standards.

A sample url from business-standard you can see ‘\’ character is unsafe.To make this URL work with JAVA URL or Apache Http you need to replace ‘\’ to ‘%5C’

The domain name is Business-Standard , but they can not keep their URL’s in standard form.How they keep their “Businees” “Standard” 🙂

These URL’s will make problem with java URL and Apache Http. So in order to jump out of these situations,you need to normalise the urls.There are URL Encoder in JAVA to encode URL’s.But its not advisable to do URL encoding in the URL as a whole.It will create another problems and i found it may make the URL more clumsy and will not work. is the encoded URL from URLEncoder and it will not work even in browser.

So URL Endoing is a real time standing problem.One solution to this problem is , instead of encoding the whole URL , just encode only the unsafe characters.
This is an example malformed URL.|ltr

The ‘|’ symbol is considered as unsafe but it is included in the URL.A browser can easily encode and understands it,But when you need to call a URL  using some http libraries then it may cause problems.

So to make over this situation you can simple replace the ‘|’ character with ‘%7C’ which is the HEX equivalent and you can see the problem solved.

This is not a neat work around solution because there are many unsafe characters and we need to check the URL for unsafe character occurrence and replace it with its HEX equivalent.

Then I found a stack over flow thread talking about the same and found a neat work around solution to the problem.I found it quite useful and its working fine with my present set of malformed URL. 🙂 Thanks to scott

public class CanonicalizeURL {
public static String escapeIllegalURLCharacters(String url) throws Exception{
String decodeUrl = URLDecoder.decode(url,"UTF-8");
URL urlString = new URL(decodeUrl);
URI uri = new URI(urlString.getProtocol(), urlString.getUserInfo(), urlString.getHost(), urlString.getPort(), urlString.getPath(), urlString.getQuery(), urlString.getRef());
return uri.toString();


Train / Test Mahout SGD Classifier

Mahout 0.6 is released with bug fixes and new implementations. I experimented mahout for my previous projct , but it was naive bayes classifier which i used.

But now my data set is too short to run with Naive Bayes and also i came across a topic in Mahout In Action MEAP “Choosing an algorithm to train the classifier”, and it seems to select Stochastic gradient descent (SGD).So its clear that SGD is highly advisable for “Small to medium (less than tens of millions of training examples)”.

So i decided to choose SGD because its best fit for my data set.Ok, selected the algorithm then what’s up next.?? Need to dig in to the mahout source to find some hints on how to run SGD.

Yes..Examples are there which points to How to train and test using SGD.

So lets search for the command line options to run SGD.In examples/bin of mahout binary you can see
I went through that file and got an idea about how to train and test SGD in command line.

Found the way how its works, lets start the game.:)

Before that please make sure that you have a sample dataset to train.You can download 20-news group data set .

Input data set is ready ,now we are going to train it using SGD.

You can download the mahout-distribution-0.6 from any of the mirrors. Extract it and cd in to mahout folder.
There is a class TrainNewsGroups in org.apache.mahout.classifier.sgd and it accepts path of the input data set as argument.

./bin/mahout org.apache.mahout.classifier.sgd.TrainNewsGroups /home/sree/Desktop/20news-bydate/20news-bydate-train/

If you forgot to set the JAVA_HOME , an error may occur.Set the JAVA_HOME if needed.

Currently SGD implementation in mahout supports sequential / online / incremental execution methods.It will not run parallel like naive bayes.(which i experimented before)

By default TrainNewsGroups create model files (.model as filetype) in /tmp directory.While training you can see whether these files are created or not.If you can see files with names “news-group-{a number}.model” , then its sure that SGD started training over our news group data set.

Once training completed you can find a set of  .model files are generated in the /tmp directory.You can choose simply “news-group.model” or “news-group-{MAX NUMBER}.model” as model.

If you can completed the training with out any errors ,  model created from the input data set.:) Its time to test it .This part determines how much accuracy we can get on test data using SGD.

To Test the data against a model , we can use TestNewsGroups in org.apache.mahout.classifier.sgd.TestNewsGroups.

TestNewsGroups has two mandatory arguments :
–input : path of the test data
–model : path of model file.

Finally its time to test it.

./bin/mahout org.apache.mahout.classifier.sgd.TestNewsGroups –input /home/sree/Desktop/20news-bydate/20news-bydate-test/ –model /tmp/news-group.model

and you can see a confusion matrix and classified instances.

I got 73.513% , If you need to see confusion matrix just follow this link

I just discussed the accuracy of SGD in mahout mailing list and there were some follow ups.
You can search mahout mailing list archive of Dec 2011 with topic “Mahout SGD / Bayes prediction results over 20newsgroups”.

I think SGD accuracy not satisfactory when compared with Naive Bayes on the same 20-news data set. :( since SVD is best advisable for small / medium sized dataset.Lets hope mahout developers will work on it. 🙂

Installing ImageMagick in Ubuntu 11.04

I am working in a project , which includes some image maniputaion functions.After a quick googling i found ImageMagick is best to satisfy my needs.

I installed ImageMagic using sudo apt-get install imagemagick.

After installation i tested some of the commands provided by imagemagick ( convert , identify). But it is not working and causing an error “No Delegates for this image”.

Then i realised that there may be some dependencies for ImageMagick and  i searched for delegates,

I have tried the wiki page for setting up ImageMagick but failed again 😦

So decided to install it from source.

You’ll need to install a number of dependencies in addition to ImageMagick in order to have a fully functional ImageMagick installation. It’s important that these dependencies are installed before you start configuring and compiling ImageMagick, because the configure script for ImageMagick will disable functionality that isn’t available because of missing dependencies at compile time.

 The list of dependecies which found usefull for my use case are

 sudo apt-get install libjpeg8-dev libpng12-dev libglib2.0-dev libfontconfig1-dev zlib1g-dev libtiff4-dev

 After installing the above dependencies i just started to compile ImageMagick from source.

 You can download the ImageMagick source from any mirrors

ImageMagick-6.7.5-6 is the latest version.

After downloding the source then tar it.

tar xvfz ImageMagick-6.7.5-6

cd ImageMagick-6.7.5-6


If you need to do some advanced configuration then follow this link

sudo make

sudo make install

Installation completed ,Hooray..:) I checked ImageMagick commands and found the same delegate problem again.:( 😦

Compiling should restart from step ./configure and i added a –disable-shared option

./configure –disable-shared

Installtion completed again..No emotion.

I checked whether all the delegates and dependencies configured with ImageMagick properly

You can check it using convert -list configure

🙂 again.Delegates are configured properly.

DELEGATES fontconfig freetype jpeg jng mpeg png x11 zlib

SO time to run some ImageMagick commands.

Started with identify command.

identify a.jpg

its working..WOWWW..

a.jpg JPEG 321×400 321×400+0+0 8-bit DirectClass 28.9KB 0.000u 0:00.000

then i tried convert one image to another format

convert a.jpg a.gif

WOOOWWW again, its working..:) 🙂

So happy emotions again.:) 😀 😛 . ImageMagick set up.

Now i am going to get my hands dirty with ImageMagick. Courtesy : My Guru Jaganadhg , he usually says so , if he started learning new things.

UIMA SDK & Plugin installation

I just started dig in to UIMA core and i found some difficulties to set it up . UIMA has a good documentation but it is not best pointer for a newbie.:( . So i think writing myself a blog showing some pointers regarding the installtion and initial set up of UIMA.


1) JDK

Hope you already set up java.

2) Eclipse IDE

3) Eclipse EMF Plugin

You can get the UIMA SDK here

Here is a good pointer

Or you can directly go through the uima docs

Hoppity Hop! Facebook Puzzle in Java

This is a very simple facebook puzzle.I just started to solve the facebook puzzles.So started with simple problems..

Write a program that takes as input a single argument on the command line. This argument must be a file name, which contains a single positive integer. The program should read this file and obtain the integer within, and then output a sequence of strings based upon the number (details below).

Input specifications
The input file will contain a single positive integer (in base 10) expressed as a string using standard ASCII text (e.g. for example, the number “15” but without the double quotes). This number may or may not be padded on either side with white space. There will be no commas, periods, or any other non-numeric characters present within the number. The file may or may not terminate in a single new line character (“\n”).
An example input file is below:

Output specifications
The program should iterate over all integers (inclusive) from 1 to the number expressed by the input file. For example, if the file contained the number 10, the submission should iterate over 1 through 10. At each integer value in this range, the program may possibly (based upon the following rules) output a single string terminating with a newline.

* For integers that are evenly divisible by three, output the exact string Hoppity, followed by a newline.
* For integers that are evenly divisible by five, output the exact string Hophop, followed by a newline.
* For integers that are evenly divisible by both three and five, do not do any of the above, but instead output the exact string Hop, followed by a newline.

Example output (newline at end of every line):

import java.util.Scanner;

public class HoppityHop
 public static void main(String args[])throws Exception
 String fileName = args[0]; // input file
 int line;
 Scanner fileScan, lineScan;
fileScan = new Scanner (new File(fileName));

 StringBuffer stringBuffer = new StringBuffer();
 while (fileScan.hasNext())
 line = Integer.parseInt(fileScan.nextLine());

 public static void hoppityHop(int number)
 for(int i=1;i<=number;i++)

 public static void evenlyDivision(int number)
 int[] divident = {3,5};
 if(number % divident[0] == 0)
 if(number%divident[1] == 0)
 else if(number%divident[1] == 0)

FOSSMeet NIT Calicut

Fortunately i got my colleague Biju to attend the FOSSMeet at NIT Calicut.We two delivered talks.We reached at NIT campus on sunday morning.Mr.Karthik met me at the main gate and he directed us to the Bhaskara Hall where all open talks happened.

My talk was about Introduction to NLTK and it was stared at 12.05 as requested by Mr.Anil.After discussed about the introduction part about NLP an NLTK i went thru some basic practical work outs.I think it was the best way to learn a practical tool kit.

Many doubts came from audience,and i tried to clarify it.Biju also supported me to clarify some doubts.Actually it make the session interactive.After my talk i got a memento from fossmeet team.

Biju delivered his talk on Apache Mahout at 3.00 pm.He provided a demo of Document classification and Recommendation systems.Much of the audience were students.I think they got a bit confused because Biju’s talk was on latest technologies like Mahout and Hadoop.He tried his best to explain as simple as possible.Me also tried to clarify map-reduce concept.Question came about some algorithms he discussed.

Met many students after our talk.It was very nice to interact with students.Some of them are doing great jobs.

Thank u fossmeet team..Bye NITC