Friday, May 22, 2015

Setup Hadoop using Sandbox

I am now learning Hadoop, and find the easiest way to setup the environment is to use Hortonworks Sandbox (I also tried to install everything using homebrew, and it took me much longer time to setup everything). I will just list the steps here to setup the environment, and hope this is something also useful to you.

Step 1 - install Sandbox and start it:
Download the Sandbox from here, I downloaded the one for VirtualBox (Mac & Windows). It is essentially setup a Linux server (RedHat) using Virtual Machine with all the needed package installed, e.g. Hadoop, Hive, pig, etc. Follow the instructions from the install guides pdf to start the Virtual Machine.

Step 2 - Login into the Linux server:
You can either use the provided GUI, or use SSH. I prefer working in terminal, so I use SSH.
$ssh root@127.0.0.1 -p 2222  #passwd: hadoop
If you want to put/get files from your local machine, then use the sftp:
$sftp -P 2222 root@127.0.0.1 #Note here the -P flag is upper case
To make things easier, just make alias in .bashrc file, then we can only type 'hssh' to login the server.
alias hssh='ssh root@127.0.0.1 -p 2222'
alias hsftp='sftp -P 2222 root@127.0.0.1'


Step 3 - SSH login without passwd
If you don't like to type the password every time you login, you can do the following:
(1) check if you already have .ssh folder in your home directory
     $ls -a ~/.ssh
(2) if you don't have the folder, then generate by using:
     $ssh-keygen
(3) Then put the generated key to the server.
    $scp -P 2222 ~/.ssh/id_rsa.pub root@127.0.0.1:~/.ssh/authorized_keys

Ok, now all set, and you can use it follow the Hortonworks tutorials.


No comments:

Post a Comment