SIMD and MIMD

Amit Nadiger

Polyglot(Rust🦀, C++ 11,14,17,20, C, Kotlin, Java), Embedded systems, Linux, Android TV,STB, Cas, Blockchain, Polkadot, UTXO, Substrate, Wasm, Proxy-wasm,Engineering management.

Published Mar 30, 2025

SIMD (Single Instruction, Multiple Data)SIMD is a type of parallel processing where a single instruction operates on multiple data points simultaneously. Imagine having an assembly line where one station performs the same task on multiple items at once.

How it works:

Vector Processing: SIMD processors have specialized registers that can hold multiple data elements (like integers, floating-point numbers, etc.).
Single Instruction, Multiple Operations: A single instruction is fetched and decoded, but its then applied to all the data elements within the SIMD registers in parallel.
Data Parallelism: SIMD is most effective when dealing with data-parallel tasks, where the same operation needs to be performed on a large set of data, such as in image processing, multimedia, and scientific simulations.

EX:Consider adding two arrays of numbers:

array1 = [1, 2, 3, 4]
array2 = [5, 6, 7, 8]
result = [0, 0, 0, 0]

A traditional processor would perform the addition element by element:

result[0] = array1[0] + array2[0]
result[1] = array1[1] + array2[1]
result[2] = array1[2] + array2[2]
result[3] = array1[3] + array2[3]

A SIMD processor could load multiple elements from array1 and array2 into its vector registers and perform the addition in a single instruction, potentially processing all four additions at once.

What is AVX?

AVX (Advanced Vector Extensions) is an x86 SIMD (Single Instruction, Multiple Data) instruction set introduced by Intel and AMD to accelerate data-parallel computations. It allows a single CPU instruction to process multiple data elements simultaneously, leveraging wide vector registers (e.g., 256-bit or 512-bit). AVX is commonly used for tasks like:

Matrix/vector operations (e.g., linear algebra),
Image/video processing,
Machine learning inference.

Example of AVX code : Just for understanding purpose , I didnt execute this anywhere

below code adds two arrays 8x faster than scalar code (assuming AVX is supported).

#include <immintrin.h> // AVX intrinsics header

void add_arrays(float* a, float* b, float* c, int N) {
    for (int i = 0; i < N; i += 8) { // Process 8 floats at once (256-bit AVX)
        __m256 vecA = _mm256_loadu_ps(&a[i]); // Load 8 floats from a
        __m256 vecB = _mm256_loadu_ps(&b[i]); // Load 8 floats from b
        __m256 vecC = _mm256_add_ps(vecA, vecB); // Add them in parallel
        _mm256_storeu_ps(&c[i], vecC); // Store result
    }
}

The par_unseq policy allows the compiler/runtime to use both multi-threading and SIMD (e.g., AVX). However:

Auto-Vectorization: The compiler may generate AVX instructions automatically for simple loops (e.g., std::transform on floats).
No Guarantee: Using par_unseq does not guarantee AVX usage. It depends on:

Compiler optimizations (e.g., -O3, -mavx flags),
Data alignment and loop structure,
Hardware support.

MIMD (Multiple Instruction, Multiple Data)

MIMD is a more flexible form of parallel processing where multiple processors can execute different instructions on different data simultaneously. Think of it as having multiple independent assembly lines, each working on different tasks.

Recommended by LinkedIn

The Copilot Era:My Speech at Semantic Kernel DevDay in…

Yaqi Zhangعالية 1 year ago

Write-Up and Code for Geometric Prime Number Prediction

Daniel Charboneau 2 weeks ago

Video Super-Resolution to ONNX

Massimiliano Pizzola 5 years ago

How it works:

Multiple Processors: MIMD systems have multiple independent processing units (cores, CPUs, or even entire computers).
Independent Execution: Each processor can fetch and execute its own sequence of instructions.
Task Parallelism: MIMD is well-suited for task parallelism, where different parts of a larger task can be broken down and executed independently on different processors. Its also effective for data parallelism, where the data is divided among processors, and each processor works on its portion.

Example:

Consider rendering a complex 3D scene:

One processor could handle the geometry calculations.
Another processor could handle the texturing.
A third processor could handle the lighting and shading.

Each processor executes different instructions on different parts of the scene data concurrently.

Other Parallel Architectures

While SIMD and MIMD are the two primary classifications, other architectures exist:

SISD (Single Instruction, Single Data): This is the traditional sequential processing model, where one instruction operates on one data element at a time. Most single-core CPUs are SISD.
MISD (Multiple Instruction, Single Data): This architecture is less common in practice. It involves multiple processors executing different instructions on the same data stream. One potential example is a fault-tolerant system where multiple processors perform the same calculation and their results are compared for correctness.

How C++17 Utilizes SIMD and MIMD

The C++17 parallel algorithms, particularly when used with the std::execution::par_unseq policy, can leverage both SIMD and MIMD parallelism:

MIMD: The par and par_unseq policies allow the implementation to divide the work of the algorithm among multiple threads, which can run on different cores (MIMD).
SIMD: The par_unseq policy explicitly allows the implementation to use vectorization (SIMD) within each thread, if the hardware and the operations are suitable. This means that within a single thread, the processor might be executing a single instruction on multiple data elements.

Finnaly

SIMD focuses on data parallelism by applying the same operation to multiple data points simultaneously.
MIMD focuses on task parallelism and data parallelism by allowing multiple processors to execute different or the same instructions on different data.
C++17 parallel algorithms can utilize both SIMD and MIMD to achieve higher performance by leveraging the capabilities of modern multi-core processors with SIMD units.

Referance ;

A Comprehensive Guide to Single Instruction Multiple Data

SIMD: Supercharging Your Code with Parallel Processing - DEV Community

Exploring Parallel Processing: SIMD vs. MIMD Architectures | by Aditya Bhuyan | Medium

SIMD and MIMD

Amit Nadiger

Polyglot(Rust🦀, C++ 11,14,17,20, C, Kotlin, Java), Embedded systems, Linux, Android TV,STB, Cas, Blockchain, Polkadot, UTXO, Substrate, Wasm, Proxy-wasm,Engineering management.

What is AVX?

Recommended by LinkedIn

Other Parallel Architectures

How C++17 Utilizes SIMD and MIMD

More articles by Amit Nadiger

Insights from the community

Others also viewed

Mastering Array Rotation: The Key to Optimizing Your Data Manipulations

Self-Healing Memory System

Boosting Logistic Regression Performance: Migrating from SciKit-Learn (CPU) to CuML (GPU)

DeepSeek bio experiment

Understanding Intel Machine Language

Using Free Resources to Prepare Data for Training Object Detection Models with NVIDIA TAO Toolkit

Solving Thousands of Paths in Parallel with CUDA and Python

Boole Algorithm

Parallel and distributed computation in C++17

Explore topics

What is AVX?

Recommended by LinkedIn

Other Parallel Architectures

How C++17 Utilizes SIMD and MIMD

More articles by Amit Nadiger

PhantomPinned, Pin/UnPin - APIs

Pinning in Rust

Adapter design pattern

Media Streaming using Axum

OnceLock

Factory design pattern

Traits and Concepts in C++

jthread-joining thread

Coroutines in C++

List of C++ 20 additions

Insights from the community

Others also viewed

Mastering Array Rotation: The Key to Optimizing Your Data Manipulations

Self-Healing Memory System

Boosting Logistic Regression Performance: Migrating from SciKit-Learn (CPU) to CuML (GPU)

DeepSeek bio experiment

Understanding Intel Machine Language

Using Free Resources to Prepare Data for Training Object Detection Models with NVIDIA TAO Toolkit

Solving Thousands of Paths in Parallel with CUDA and Python

Boole Algorithm

Parallel and distributed computation in C++17

Explore topics