top of page
Search


Using AWS CLI, Terraform CLI and Ansible to automate HPC cluster creation
This is the final part of an eight-part series on how to set up an HPC cluster on AWS. This document explains how to set up an OpenPBS job scheduler in an AWS cluster. Terraform is an Infrastructure-as-Code (IaC) tool used to provision and manage cloud infrastructure, such as servers, networks, and storage, across platforms like AWS. It defines resources in declarative .tf files, allowing you to create or destroy environments consistently and repeatably. Ansible complements

Joseph
Nov 13, 20255 min read
Setting up Prometheus and Grafana in an AWS Cluster
This is the seventh part of an eight-part series on how to setup an HPC cluster on AWS. This document explains how to set up an OpenPBS job scheduler in an AWS cluster. The cluster has seven virtual machines (VMs) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) Prometheus is an open-source monitoring and alerting toolkit designed for r

Joseph
Nov 13, 20253 min read
Setting up an LDAP System in an AWS Cluster
This is the sixth part of an eight- part series on how to set up an HPC cluster on AWS. This document explains how to setup LDAP in the cluster The cluster has seven virtual machines (VMs) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) Lightweight Directory Access Protocol (LDAP) is a protocol used for accessing and managing directory

Joseph
Nov 11, 20258 min read
Setting up a PBS Scheduler in an AWS Cluster
This is the fifth part of an eight-part series on how to setup an HPC cluster on AWS. This document explains how to set up an OpenPBS job scheduler in an AWS cluster. The cluster has seven virtual machines (VMs) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) OpenPBS (Portable Batch System) is an open-source workload management and job

Joseph
Nov 11, 20254 min read
Setting up a BeeGFS File System in an AWS Cluster
This is the fourth part of an eight-part series on how to set up an HPC cluster on AWS. This document explains how to set up a BeeGFS file system in the AWS cluster. The cluster has seven virtual machines (VMs) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) BeeGFS is a high-performance parallel file system designed for scalable storage

Joseph
Nov 11, 20257 min read
Enabling Passwordless SSH in an AWS cluster
This is the third part of an eight part series on how to setup an HPC cluster on AWS. This document explains how to setup passwordless across all the nodes in the cluster The cluster will have seven virtual machines (VM) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VM will have the OS Rocky Linux 9.6 (Blue Onyx) Passwordless SSH is required in HPC clusters to allow nodes to communic

Joseph
Nov 10, 20256 min read
Preparing AWS instances for HPC Cluster
This is the second part of an eight-part series on how to set up an HPC cluster on AWS. The cluster has seven virtual machines (VMs) One head / control node (node1) One login node (node2) Three compute nodes (node3, node4, node 5) Two storage nodes (node6, node7) All the VM will have the OS Rocky Linux 9.6 (Blue Onyx) This part explains the initial setup to install the packages we will need for the different components of the HPC cluster. The first thing we have to do is disa

Joseph
Nov 10, 20253 min read


Virtual Machine Setup for an HPC Cluster in AWS
This is the first part of an eight-part series on how to set up an HPC cluster on AWS. This document outlines the different aspects of the VMs deployed in the cluster and describes how each component fits into the design. The cluster will have seven virtual machines (VM) One head / control node One login node Three compute nodes Two storage nodes All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) AMI: ami-0f2425d4cce4e97dd Instance Type: t3.2xlarge When the VMs are cre

Joseph
Nov 10, 20255 min read


Setting up an HPC cluster on AWS
This is an eight-part series on setting up an HPC cluster on AWS. The main design elements of the cluster will be as follows The cluster will have seven virtual machines (VMs) One head / control node One login node Three compute nodes Two storage nodes All the VMs will have the OS Rocky Linux 9.6 (Blue Onyx) AMI: ami-0f2425d4cce4e97dd Instance Type: t3.2xlarge BeeGFS is used as the shared filesystem OpenPBS is used as the HPC job scheduler OpenLDAP is used for user managem

Joseph
Nov 10, 20251 min read


Using LLaMa with VSCode
Download and install Ollama. There are multiple LLMs available for Ollama. In this case, we will be using Codellama, which can use text...

Joseph
Jul 6, 20241 min read


Creating a Singularity Container for Linux Machine with GPU Support in AppleMac with Apple Silicon
Some HPC machines today use singularity containers for their machine learning workflows. Once configured Singularity containers can be...

Joseph
Jun 7, 20244 min read


Distributed-Dask with PBS
Dask is a popular Python library designed for scalable computing with dynamic task scheduling. A key strength of Dask lies in its...

Joseph
Aug 30, 20233 min read
Using Vim and Ctags to Manage Large Projects
The usual workflow in developing an HPC application is to develop the code in local machines and then run the completed application in an...

Joseph
Nov 9, 20222 min read
Automate Workflow Using VSCode
VSCode is a very popular tool to manage large projects in C/C++. One of the main advantages of VSCode is we can automate workflows that...

Joseph
Jul 29, 20222 min read


Debugging MPI Programs Using Valgrind and GDB
Debugging a Parallel program is not straightforward as debugging a sequential program because it involves multiple processes with...

Joseph
Sep 25, 20204 min read
bottom of page