Wednesday, April 13, 2011

Running hadoop in windows machine:
Though windows supported as development platform by hadoop some tweaks are necessary to successfully start hadoop services.The following modes are supported by hadoop
Local (Standalone) Mode
Pseudo-Distributed Mode

Fully-Distributed Mode

Now we will see the steps to start hadoop in standalone mode in windows. I'm sure you will encounter one or more issues mentioned below when starting hadoop via cygwin

Required Software:
1)JavaTM 1.6.x, preferably from Sun
2)ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. (Cluster mode)
3)Cygwin

Download:
Download stable Hadoop distribution from Apache Download Mirrors.

Prepare to Start the Hadoop Cluster:
Unpack the downloaded Hadoop distribution.Define JAVA_HOME as environment variable or edit conf/hadoop-env.sh file
Try the following command:
$sh bin/hadoop
This should the usage documentation for the hadoop script without any start-up errors.


How ever you will get the error "C:\program command not found" if JRE is installed in default path (c:\program files\jre*")

How to fix this issue ?
Open the file hadoop-config.sh.Search for the text JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} and replace ${JAVA} with "${JAVA}" to handle space related issues in the file path while running hadoop via cygwin

Now this error will disappear and popup another error "java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName "

The class org.apache.hadoop.util.PlatformName exists in hadoop-common-*.jar.
The hadoop script
automatically adds all necessary files CLASSPATH and it seems cygwin-style paths (/cygdrive/c/apps/hadoop-0.21.0) on classpath is not recognised properly when starting java runtime.

Note:set -x option can be used to debug the scripts

How to fix this issue now?

Open hadoop-config.sh file and add below line before using the CLASSPATH variable to define JAVA_PLATFORM (add after line JAVA_LIBRARY_PATH='')
CLASSPATH=`cygpath -p -w "$CLASSPATH"`


Now run $sh bin/hadoop which will display usage documentation like below

Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the

2 comments:

Unknown said...

I have configured in your way and its working. But when I am running code , its completing map 100 but reduce 0%/. Please I am facing this issue since long time please help me

Hadoop training in velachery
Big data training in velachery

Unknown said...

The content provided here is vital in increasing one's knowledge regarding hadoop, the way you have presented here is simply awesome. Thanks for sharing this. The uniqueness I see in your content made me to comment on this. Keep sharing article like this. Thanks :)

Hadoop Training Chennai | Big Data Course in Chennai | Big data training in Chennai

Enter your Comments