Issue
I know that ssh key connection should be required for the hadoop operation.
Suppose that there are five clusters consisting of one namenode and four data nodes.
By setting the ssh key connection, we can connect from namenode to datanode and vice versa.
Note that two-way connection should be required for hadoop operation, which means that only one side (namenode to datanode, but not connect to from datanode to namenode) is not possible to operate hadoop as far as I know.
For above scenario, if we have 50 nodes or 100 nodes, it is very laborious jobs to configure all the ssh-key command by connecting the machine and typing same commands ssh-keygen -t ...
For these reasons, I have tried to script the shell code and but failed to do it in an automatic way.
my code is as below.
list.txt
namenode1
datanode1
datanode2
datanode3
datanode4
datanode5
...
cat list.txt | while read server
do
ssh $server 'ssh-keygen' < /dev/null
while read otherserver
do
ssh $server 'ssh-copy-id $otherserver' < /dev/null
done
done
However, it didn't work. As you can understand, the code means that it iterates over all the nodes and creates the key and then copy the generated key into other server using the ssh-copy-id
command. But the code didn't work.
So my question is that how to script the codes which enables ssh connection (bothways) using shell scripts...It takes a lot of time for me to achieve it and I cannot find any document describing the ssh connection for multi nodes for avoiding laborious tasks.
Solution
You only need to create a public/private key pair at the master node, then use ssh-copy-id -i ~/.ssh/id_rsa.pub $server in the loop. And the master should be in the loop. And there is no need to do this in reverse at the namenodes. The keys have to belong and installed by the user that is running the hadoop cluster. After running the script, you should be able to ssh to all namenodes, as the hadoop user, without using a password.
Answered By - Simon Schiff