Enabling Passwordless SSH in an AWS cluster

Joseph
Nov 10
6 min read

Updated: Nov 14

This is the third part of an eight part series on how to setup an HPC cluster on AWS. This document explains how to setup passwordless across all the nodes in the cluster

The cluster will have seven virtual machines (VM)

One head / control node (node1)
One login node (node2)
Three compute nodes (node3, node4, node 5)
Two storage nodes (node6, node7)
All the VM will have the OS Rocky Linux 9.6 (Blue Onyx)

Passwordless SSH is required in HPC clusters to allow nodes to communicate and execute tasks automatically, such as job scheduling, data transfers, and parallel computations, without repeated password prompts, ensuring efficient and seamless operation.

To avoid this discrepancy when installing packages follow steps outlined in the second part of this series. That document also exaplains how to disable SELinux.

Create the SSH keys

On each node, ensure the ~/.ssh directory exists, creating it if necessary:

mkdir -p ~/.ssh     
chmod 700 ~/.ssh     
chown rocky:rocky ~/.ssh

Then, generate an SSH key pair on each node:

ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""

This creates a new keypair (id_rsa private key and id_rsa.pub public key). The -N option in ssh-keygen specifies the new passphrase for the private key, and using -N "" sets it to an empty password (no passphrase). This makes it possible to use this keypair without a password prompt.

Once the keys are created, copy the new key back to your local system. Since the local system’s public key (terraform-user.pub) was already added to the VM during its creation, transferring files from the VM is easier. On the local system:

mkdir keys
cd keys
scp -i ~/.ssh/terraform-user rocky@<node1 ip>:~/.ssh/id_rsa node1.pub

This scp command copies the key of node1 to the local system. Do that for all other nodes.

Set the hostname

Once the keys are created and copied to the local system we can set the hostname for each node in the cluster. The hostname is the unique name assigned to a system on a network. It is used to identify the machine in communications with other nodes, for logging, and for network management. To set the hostname on node1 run the command

sudo hostnamectl set-hostname node1

This command will set the hostname of node1 as 'node1'. To make this persistent across reboots write the hostname to /etc/hostname:

    echo "node1" | sudo tee /etc/hostname

Repeat the same for other nodes - of course, with other hostnames.

Hosts File Setup

Next, we configure the `/etc/hosts`. The /etc/hosts file is a local mapping of hostnames to IP addresses. It allows systems to resolve names without using DNS, making it easy to connect to other machines by name (like node1) instead of their IP addresses.

We need to make sure that the localhost and IPv6 entries are correctly set in /etc/hosts.

Ensuring correct entries for localhost and IPv6 addresses allows the system to properly resolve its own name and handle network communications internally, which is essential for many services and applications. Add these lines to /etc/hosts:

127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0    ip6-localnet      # IPv6 local network address
ff00::0    ip6-mcastprefix   # IPv6 multicast prefix
ff02::1    ip6-allnodes      # Multicast address for all IPv6 nodes on the local network
ff02::2    ip6-allrouters    # Multicast address for all IPv6 routers on the local network
ff02::3    ip6-allhosts      # Multicast address for all IPv6 hosts

Finally, add the details of all nodes in the cluster along with their local IP addresses to /etc/hosts. To find the local (private) IP address on RHEL, you can the following command:

    ip addr show

This display the IP addresses assigned to the network interfaces on the node. An example

hostname-to-IP mapping will look like this

10.0.1.20 node1
10.0.1.19 node2
10.0.1.56 node3
10.0.1.60 node4
10.0.1.35 node5
10.0.1.12 node6
10.0.1.22 node7

In the end, the /etc/hosts file should look like this:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

10.0.1.20 node1
10.0.1.19 node2
10.0.1.56 node3
10.0.1.60 node4
10.0.1.35 node5
10.0.1.12 node6
10.0.1.22 node7

Append Keys

Earlier, we had copied the public keys for each node in the cluster to the local system. Now combine the multiple keys into a single file. On the local system do:

 cat *.pub >> authorized_keys

Next, copy this authorized_keys file to each node in the cluster using scp. This

ensures that all nodes have the same set of authorized keys. For example to copy the authorized_keys file to node1 you can do this:

scp -i ~/.ssh/terraform-user authorized_keys rocky@<node1 ip>:~/.ssh/authorized_keys

Do this for all the nodes in the cluster.

The authorized_keys file in a user's ~/.ssh/ directory tells the SSH server which public keys are allowed to log in as that user. When someone tries to SSH in:

The client presents its private key.
The server checks if the corresponding public key exists in authorized_keys.
If it matches, the login is allowed without a password.

This enables passwordless, secure SSH access from multiple users or machines, while keeping control centralized.

Once the authorized_keys file is set on all nodes, set the correct permissions for the authorized_keys file to ensure SSH works properly:

sudo chmod 600 /home/rocky/.ssh/authorized_keys
sudo chown rocky:rocky /home/rocky/.ssh/authorized_keys

In some cases the /etc/hosts file will have stale ip entry. A stale entry in the /etc/hosts file refers to a hostname-to-IP mapping that is no longer valid - for example, if a server’s IP address has changed but the old IP still exists in the file. Stale entries can cause issues like:

SSH or other network connections trying to reach the wrong IP.
Applications resolving hostnames incorrectly.
Confusion during cluster or multi-node setups where consistent name resolution is critical.

So it is a good practice to remove outdated entries and ensure all hostnames point to the correct IP addresses. To do that run the following command on all nodes.

  ssh-keygen -R <other-node-ip>

The ssh-keygen command removes a host’s old SSH key from your ~/.ssh/known_hosts file. This file keeps a record of the public keys of all remote hosts your system has previously connected to via SSH. Each time you connect to a server, the SSH client checks this file to verify that the server’s key matches what was stored from earlier connections.

If the key matches, the connection proceeds smoothly. If it doesn’t, SSH warns of a potential security risk.

Running the ssh-keygen command clears the outdated entry for a specific IP (e.g., <other-node-ip>). In a cluster, you need to perform this for all IPs of all nodes to ensure smooth, passwordless SSH connectivity across the entire cluster. And you need to do this on alll cluster nodes.

Now that you have cleared any stale entries in the known_hosts file, retrieve the current host key from the remote machine (`<other-node-ip>`) and add it to

your local known_hosts file. This allows SSH connections without interactive prompts:

ssh-keyscan -H <other-node-ip>

Again, you need to perform this for all IPs of all nodes to ensure smooth, passwordless SSH connectivity across the entire cluster. And you need to do this on alll cluster nodes.

Now that all the known_hosts file is populated edit the ~/.ssh/config file to include the following settings:

    Host *
      StrictHostKeyChecking no
      UserKnownHostsFile /home/rocky/.ssh/known_hosts
      LogLevel ERROR

StrictHostKeyChecking no - ensures the SSH client does not prompt you when connecting to a host whose key is new or has changed.
UserKnownHostsFile specifies the file where known host keys are stored.
LogLevel ERROR reduces unnecessary SSH log messages.

After this, edit /etc/ssh/sshd_config on each node and ensure the following:

PasswordAuthentication yes – This allows users to log in using a password instead of an SSH key. It’s useful as a fallback, but enabling it can be less secure than key-based authentication.
ChallengeResponseAuthentication no – This disables challenge-response authentication, a method where the server sends a challenge (like a one-time code) and the client must respond correctly. Turning it off simplifies login and avoids unnecessary prompts.
UsePAM yes – This enables Pluggable Authentication Modules (PAM), which provide a flexible way to handle authentication. PAM can support extra security features like account limits, two-factor authentication, or logging, enhancing the SSH login process.

Disable cloud configurations

On a bare-metal system, the previous SSH configuration steps would usually be sufficient.

However, since we are working with AWS instances, additional steps are required to

prevent cloud-specific settings from interfering.

On every node, disable cloud-init SSH management:

 sudo chmod 644 /etc/cloud/cloud.cfg.d/99-disable-ssh-password.cfg
 sudo rm -f /etc/ssh/sshd_config.d/60-cloudimg-settings.conf

Deleting the SSH configuration file is important because cloud images often include default

SSH settings that can disable password login or override your manual SSH configuration.

Also, in /etc/cloud/cloud.cfg (create the file if it doesn't exist), enable password authentication to ensure cloud-init does not overwrite your SSH settings on reboot or redeployment:

ssh_pwauth: 1

Setting ssh_pwauth to 1 allows SSH password login. This prevents cloud-init from resetting or disabling password authentication during instance reboots or redeployments, ensuring your manual SSH configuration remains intact.

Finally restart the SSH service:

sudo systemctl restart sshd

To verify that passwordless SSH is working, test SSH from each node to every other node.

For example, from node1, run:

    ssh node2

You should be able to log in to node2 without being prompted for a password. This confirms

that the SSH key setup was successful.

The next part outlines the steps to setup a shared BeeGFS file system across the cluster. The main GitHub repo for is available here.