One area I’m planning to spend time investigating is Hadoop, a mapreduce and distributed computing framework under the Apache foundation banner. Over the next few blog posts I’ll explain how I installed Hadoop 0.21.0 on Windows. This blog post covers the first part – the SSH server.
1) Install Cygwin
Hadoop uses SSH to control jobs running on nodes (machines capable of running hadoop jobs). Each node, including the master, needs to have an SSH server running. Given the lack of SSH support in Windows, you will need to install cygwin.
Pitfall one: The cygwin bash prompt installed as part of msysgit doesn’t include everything. You will need to run the main setup.exe
Once you have downloaded the setup.exe, install the ssh package and it’s dependencies. I also recommend you install vim as you will need to make config changes at some point.
With the package installed, you can execute the following command to configure the host server.
Details around this are explained in more detail at http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/
During this config stage, I was asked two questions.
*** Query: Should privilege separation be used? (yes/no)
*** Query: (Say “no” if it is already installed as a service) (yes/no)
I said no to both. The first time, I said yes to the first question, which in production systems I would do. As this is just development and learning, I kept it simple.
Once installed, you need to start the sshd server with the following command:
This doesn’t give any output so in order to find out of it’s running, execute:
$ ps | grep sshd
This should result in output similar to below.
7416 1 7416 7416 ? 1001 16:08:57 /usr/sbin/sshd
Pitfall two: If you already have a server running on port 22, sshd won’t error but nothing will be listed in the ps output. Check for other servers on the port if this happens.
4) Logging in
Once installed, it’s time to login.
$ ssh localhost
If it asks you for a passphrase, then you need to run the following commands. The result will mean hadoop can execute without interruption and you requiring to enter a password.
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Note, these ssh keys will be different to your normal windows set at ~/ For example, when using cygwin bash my home directory is:
This physically maps to:
Using msysgit, my home directory is:
As such, generating ssh keys shouldn’t affect other keys relating to systems such as github. If you already have a key setup, you will only need to execute the command to copy your .pub key into authorized_keys.
With this done, we can move onto the more interesting task of installing Hadoop.