A walk through High Performance Computing
High Performance Computing

A walk through High Performance Computing

A simple interpretation 

I’ve often found myself curious about High-Performance Computing (HPC), especially when it comes to the clusters, nodes, and GPUs that are frequently mentioned. Recently, I found myself diving deep into the world of HPC and ended up going down a rabbit hole. Let me share what I discovered along the way. 

If we give it a thought, putting more computers/ people on a task could solve the problem faster. If 1 computer could solve a problem, 5000 could solve it even faster right? Let’s think about the word HPC in terms of weather forecasting. There are 2 elements when it comes to forecasting the weather: 

  1. Getting the forecast right -> writing a piece of software that takes all of the inputs and comes out with an output which is accurate enough. 
  2. We need to be able to run a model and get the result back in time frames that are meaningful. This is because if a forecast is 100% accurate but takes 5 days for the prediction, it is useless. The goal is to get this 100% accurate forecast in a time frame such that we can plan for the next day. 

HPC is like taking 5, 15 or 500 computers and then using a software that enables us to pool the computer resources together so that we can tackle problems faster and efficiently. 


Article content
Figure 1: Simple HPC architecture

Exploring the Power of Parallel Computing

Parallel computing is a paradigm in which tasks are broken down into independent or semi-independent sub-tasks for simultaneous execution. 

Let’s take the example of a Formula 1 pit crew. This team needs to perform several tasks such as refueling, changing tires, conducting repairs etc. They must complete these tasks as fast as possible. While an individual could do these tasks sequentially, each task is independent of the others. The team can then take advantage of this by utilizing a large group of specialized workers, operating at the same time to complete all the tasks much quicker. 


Article content
Figure 2: A formula 1 pit crew

Bringing this analogy into the world of computing. Computing clusters operate in a similar fashion. In a computing cluster, individual nodes act like specialized workers, each focusing on a specific task. By distributing workloads across multiple nodes, the system can process data much faster.


Article content
Figure 3: Workflow of Parallel computing

How Nodes Team Up in a Cluster: Node Layout

Computing nodes consist of many specialized nodes. Let’s take a look at some of the important ones:

The Head Node:

  • This is the node through which the system is accessed
  • Users must first login to the head node in order to gain access to the system

Compute Nodes:

  • These provide the vast majority of the systems computing power
  • They cannot be accessed directly by the users 
  • Access can only be achieved by a queue system managed by the head node

Interactive Nodes:

  • Some computing clusters provide a small number of interactive nodes
  • These can be logged in directly and used like a typical remotely accessed computer.
  • They are shared by the whole community of the cluster and are intended for small tasks such as text editing, code manipulation and small test runs. 


Where does all the data go? Understanding cluster storage

When it comes to the storage we need to keep some things in mind:

Home Directory Storage

  • Users are given small but enduring storage in their home directory 
  • It is intended for configuration files, source code etc and shouldn’t be used for large datasets that form the input or output of computational projects.

Project Storage

  • Allocated for such purposes as mentioned above.
  • It is here that large input or output datasets should be stored
  • However, large project datasets are generally not backed up. Many compute clusters have finite data longevity policies whereby project storage project storage spaces are cleared on certain dates. Hence, it is important to understand the storage of the computing clusters being used. 


Article content
Figure 4: HPC architecture

Waiting in Line: How the Queue System Works

To manage different use cases or hardware configurations, the queue system is configured to have several queues. These queues help prioritize tasks based on factors like resource requirements, job urgency, and scheduling policies. When users submit a job, they do so using a submission script, which defines the commands to be executed, the required resources (such as CPU cores, memory, GPUs, and runtime limits), and any dependencies. The queue system then processes this script, assigns it a job ID for tracking, and places it in the appropriate queue based on priority, resource availability, and scheduling policies. Once resources become available, the job is dispatched to a suitable node for execution, ensuring efficient workload distribution across the cluster.


Article content
Figure 5: Queue system for a job script

How it all comes together

The queue system is the final piece that brings everything together, ensuring that jobs are efficiently scheduled and executed across the cluster. With nodes working in parallel, data stored and managed effectively, and tasks queued for optimal performance, HPC systems can tackle massive computational challenges with precision.

At first glance, HPC might seem overwhelming, but breaking it down reveals a well-coordinated system. Every component, from parallel processing to storage and scheduling, plays a crucial role in keeping the system running smoothly. As technology advances, HPC will continue to drive innovation in AI, scientific research, and beyond. Whether you're analyzing complex data, training deep learning models, or running large-scale simulations, understanding these building blocks gives you the power to harness the true potential of high-performance computing.

So, next time you hear about clusters, nodes, and queues, you’ll know they’re all part of a well-oiled machine. As HPC continues to evolve, it will open new doors to solving some of the world’s most complex problems—pushing the boundaries of what’s possible, one job at a time.


Article content
Figure 6: Applications of HPC Systems

This article is my humble attempt to break down the complexities of HPC, and I hope it helps you see just how interconnected and powerful these systems can be :)

Norman Ramirez

CEO Sr Consultant Organizational Resilience /GRC/Business Continuity/ERM/Cyber/Crisis/DRP

3mo

Thanks a lot, it is a good brief but complete description with the right examples to better understand.

Like
Reply
Jigyasa Bhatia

Fashion Graduate | Buying and Merchandising | Trend forecasting

3mo

The analogies simplify the concept, making it much easier to understand. Loved it!

Like
Reply
Ishika Rastogi

Data Analyst | SQL | Python | PySpark | Power BI | AWS | Passionate about building data products and driving decisions through analytics

3mo

Great read!!

Like
Reply

To view or add a comment, sign in

More articles by Aarabhi Datta

Insights from the community

Others also viewed

Explore topics