Why the Fast Inverse Square Root Matters

Why the Fast Inverse Square Root Matters

The Fast Inverse Square Root algorithm is among the most famous algorithms in computer science history. In this post, I want to go over what it is and why its so important in words anyone can understand.

The Background

The late 90s and early 2000s saw the world move into 3D computer graphics at a scale never before seen. This effected industries ranging from architecture, mechanical engineering, physics, and — you guessed it — video games.

3D computer graphics explains the world in terms of vectors. A vector is a point in 3d space that has a given direction and magnitude. For example, if I was trying to explain the vector of a wind current, I might describe its source (say a fan on the table), the direction its blowing, and its wind speed. The location of the source would be its point in xyz coordinates — if it were in the middle of my 6 foot room on a table 2 feet high (please stay with me I don’t know anything about lengths) we might say it is located at (3,3,2).

In general, when we’re simulating actions in a 3D computer space, there are two types of calculations we can make: direction calculations and distance calculations. Distance calculations are pretty simple — if I wanted to find the magnitude, I could do it by taking the square root of x^2 + y^2 + z^2 and working from there. But if I want to do a direction calculation, that means I need to normalize magnitude in order to solve for angle. In other words, I need divide x, y, and z by the magnitude… which just happens to be a square root.

As to why dividing by a square root is so hard for a computer to do… its complicated. Square rooting by itself is rather trivial, and in fact we’ll see how it works when we get to the FISR algorithm proper. Inverse square rooting, on the other hand, requires a lot of background calculations to do properly and thus takes up a lot of compute time and power. Considering that 3D engines were performing these calculations not once, but millions of times a second… it led to major slow down. The progress in the 3D industry had become bricked, and the choice was between waiting for better hardware to come out to perform these calculations faster, or to rebuild the calculations from the ground up.

Id Software chose the latter.

The Algorithm

Id Software is a company that represents what this blog — and most of my career — is all about: the intersection between art and technology.

Id was, first and foremost, a game company. It knew how to make games fun — titles like Doom and Quake still have thousands of players and get regular reboots/releases every year. This is something that is purely built into the design of these games: in other words, the art.

Yet at the same time, Id Software has a prestigious history of technological advances. Id’s cofounder John Carmack is considered by many to be one of the greatest programmers of all time, let alone within the gaming space. And while I (don’t believe) Carmack made FISR, it stands that the culture he created at Id incentivized making breakthrough technological innovations, of which FISR is no exception.

The genius of FISR is in its simplicity. I call this an algorithm, but in reality it is barely one — really its just a formula that occurs at the bit level.

We of course know that the problem with existing inverse square root calculations is the amount of time it takes a typical programming language to do the necessary prerequisite functions. So, theoretically — if we could limit that process to one or two simple functions — it could speed things up dramatically.

Let’s start with just a normal square root. You might be surprised by how fast this is given the issues with an inverse square, but this should go away once you look at how a number is calculated on most computers. Most of them, in bit form, have the following structure:

{sign}{exponent}{mantissa}

A sign is pretty obvious — it is either positive, or negative. As for the exponent and mantissa, it may be important to mention that all processes on a computer are calculated as a power of 2 (since binary is either 1 or 0, two choices, and we are calculating combinations of these numbers, thus the exponent). So an exponent gets us most of the way to the number, and the mantissa (fancy word for saying decimals) provide us the exact float, up to a certain limit. It might help you to see a real number calculated in binary to signify what this segmentation looks like:

{0}{10000011}{00000000000000000000000}

This is “16.0” in binary, segmented into our float structure. 0 says its positive (1 is negative), 10000011 happens to translate into 4 (so 2^4)[1] and the mantissa section is all zeroes to say that we have no decimal places we are using.

Now, back to square rooting. Since we have a dedicated exponent section, you can now see why it’s so easy to square root. All we’d have to do is divide the exponent section by two, which can be done by shifting the entire number one bit to the right (which happens to halve the exponent section of the bit). But the inverse square root requires division — more specifically, it requires floating point division. The problem with floating point division on computers is that performs a lot of back-end iterations to make sure that the answer is exactly right. But if we’re performing vector normalization, we don’t always have to be exact — we can be good enough, and choose to iterate if needed. Unfortunately, the built-in math of computers doesn’t give us that choice.

This is where we need to rebuild the calculation.

Now, theoretically, if we know how a float is calculated in bit-terms, we could jerry-rig a solution that always changes the correct segments of the float to simulate an inverse square root. Finding this magical bit which simulates an ISR was no doubt the source of much pain for many an Id software engineer, and probably took them months to figure out. I won’t go in depth here, since I think you have the important details of this down and don’t want to get too in the weeds for a single article. Instead, we’ll skip the blood sweat and tears and go right to the ending:

0x5f3759df. It’s 0x5f3759df.

If you subtract 0x5f3759df off of a bit-shifted right number, it will give you an inverse square root. And that inverse square root is good enough for graphics programming. If you choose, you can run a Newton-Raphson iterator on it once or twice to get more exact (funnily enough this is the same iterator that happened way too often on floating point division and caused the slow down) but we have succeeded in now finding the FISR algorithm!

The Legacy

You might wonder if FISR is now the de-facto standard for how computers compute inverse square roots. The truth is, it never needed to be — about a year later we got CPUs good enough to do floating point division quickly and accurately. Some might look at this as a tragic result for FISR as it essentially became a waste of time, but this is not so. The graphical fidelity of Quake III Arena made it leagues better than its peers, and that combined with its fast-paced addictive gameplay (gameplay that would’ve not been nearly as fast if FISR wasn’t used) led it to sell gangbusters and become a staple of gaming history. Had Id simply waited a year, they would’ve been on the same level as their competitors and therefore might have released Quake III to an already saturated market. Things moved fast back then!

But FISR also found a home somewhere else: Computer Science courses. Virtually every Computer Science degree teaches FISR at some point, as an example of low-level code being used to improve the speed of processing. And while FISR specifically might not be used, similar hacks are occurring each and every day at large corporations. For example, the fundamentals of FISR are how companies like Deepseek were able to reduce the cost of AI training on GPUs versus competitors like OpenAI or Meta. And this was just last week! Because of this I hope to return to low-level programming someday, but its most famous algorithm is a great place to start.

[1] – Some more astute of you might notice this does not actually translate to 4 in raw binary. It has to do with the IEEE standardization for floats, which is outside the scope of this article — but you can look it up if you wish.

Subscribe to the newsletter!

Hadi Ali

Exploring Future Designs of the American Research University | Always a Changemaker | Relentlessly Transdisciplinary

2mo

It's remarkable how sometimes going back to the basics proves to be the best way into the future!

To view or add a comment, sign in

More articles by Jacob Robinson

  • 8 Pitfalls of Agile and How to Avoid Them

    In this article, we’ll explore the eight most common mistakes teams make when adopting Agile and provide practical…

  • So, How Do Computer Graphics Work?

    One of the open threads left in our article on the fast inverse square root algorithm was how exactly all those…

  • How To Make A (Super) Hard Drive From Scratch

    This week’s article is a bit all over the place, but it ended up being a lot of fun: we’re going to dive into the world…

  • Converting a $100k+ Trading Algorithm from Excel to Python

    When I was young, I made the mistake of getting a degree from a business school instead of an engineering school. It…

    1 Comment
  • A Deep Dive on Cybersecurity

    This is a direct continuation off our last article, A Deep Dive on Networking. More specifically, we’ll be moving into…

  • A Deep Dive on Networking

    This past week, I’ve had to refresh myself on my networking and cybersecurity knowledge. I figure that this makes for a…

  • Porting A 250k+ Item Database from Notion to Obsidian

    It was only two weeks ago where I ended a blog post by saying that I wish I could move to Obsidian, but certain…

  • I used Obsidian for 90 days. Here’s what happened.

    For years, I was begging note-taking apps to start using a wiki-like hypertext system. At that point I wanted it more…

  • Markov Chains and You!

    This is an introductory article to a new series I’ll be trying out. I love asking ChatGPT conversations about stuff I…

    1 Comment
  • Time Travel is Real. Here’s How to Use It.

    What if I told you that you could slow down time to get more done, or quicken time during boring sections of the day?…

Insights from the community

Others also viewed

Explore topics