A quick side note about installing Hadoop on Windows. My Windows username is “Ben Hall” – this will be used by hadoop in two places – one being ${user.name} and the other being $USER in shell scripts.
As hadoop is cross platform, it doesn’t expect to see spaces in the path names resulting in random errors.
The first one I received was while creating the DFS directory.
11/01/16 17:54:04 WARN common.Util: Path file:///tmp/hadoop-Ben Hall/dfs/name should be specified as a URI in
configuration files. Please update hdfs configuration.
11/01/16 17:54:04 ERROR common.Util: Error while processing URI: file:///tmp/hadoop-Ben Hall/dfs/name
java.io.IOException: The filename, directory name, or volume label syntax is incorrect
at java.io.WinNTFileSystem.canonicalize0(Native Method)
at java.io.Win32FileSystem.canonicalize(Win32FileSystem.java:396)
at java.io.File.getCanonicalPath(File.java:559)
at java.io.File.getCanonicalFile(File.java:583)
at org.apache.hadoop.hdfs.server.common.Util.fileAsURI(Util.java:78)
at org.apache.hadoop.hdfs.server.common.Util.stringAsURI(Util.java:65)
at org.apache.hadoop.hdfs.server.common.Util.stringCollectionAsURIs(Util.java:91)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceDirs(FSNamesystem.java:349)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1223)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
Not the most helpful error message. After some pondering and looking at the hadoop code-base, I realised it was due to the space in my username. To solve the problem I created an override of the default tmp path of the directory without the space. Within core-site.xml, I added the following property node:
hadoop.tmp.dir
/tmp/hadoop-BenHall
This allowed me to proceed until I hit the following error when starting the nodes:
C:hadoophadoop-0.21.0/bin/hadoop-daemon.sh: line 111: [: /tmp/hadoop-Ben: binary operator expected
C:hadoophadoop-0.21.0/bin/hadoop-daemon.sh: line 67: [: Hall-namenode-BigBlue7.out: integer expression expected
starting namenode, logging to C:hadoophadoop-0.21.0logs/hadoop-Ben
C:hadoophadoop-0.21.0/bin/hadoop-daemon.sh: line 127: $pid: ambiguous redirect
localhost: /cygdrive/c/hadoop/hadoop-0.21.0/bin/hadoop-daemon.sh: line 111: [: /tmp/hadoop-Ben: binary operator expected
Again – not an ideal exception. After looking within the hadoop-daemon.sh script, I found a usage of $USER to build up the log output path. Again, this resulted in a space in the path that caused the exception. To fix it, instead of using the $USER variable I hard-coded the value to BENH.
If you would prefer not to hard-code the value then you could use ‘sed’ to remove spaces dynamically as described here – http://mydebian.blogdns.org/?p=132
After fixing those two errors, I was able to run hadoop without any errors.