SQL Server COUNT vs. SUM on a Million rows

SQL Server COUNT vs. SUM on a Million rows

The number one question when showing multiple ways to return the same result set in SQL Server is, "which one performs better?" For this post I'm going to compare COUNT vs. SUM on 1 million rows, using both INT and BIT for the representative "isActive" column. Your boss wants to know how many active accounts there are in the database, and you need to decide if you should use COUNT or SUM to find the answer.

The setup:

CREATE TABLE #SpeedRacer (isActiveBit BIT NOT NULL, isActiveInt INT NOT NULL);
DECLARE @i INT = 0;
WHILE @i <= 1000000
    BEGIN
        INSERT INTO #SpeedRacer (isActiveBit, isActiveInt)
        VALUES (CONVERT(BIT, ROUND(1 * RAND(), 0)), ROUND(1 * RAND(), 0));
        SET @i = @i + 1;
    END;        

Let's try counting BITs first. Note that COUNT <expression> counts the number of nonnull rows, so we need to change 0 to NULL for COUNT to work. Also, the SUM operator does not work on BITs, so we need to convert the BIT to an INT for SUM to work. We'll turn on statistics to see how long it takes for each.

SET STATISTICS TIME ON;

SELECT COUNT(NULLIF(isActiveBit,0)) AS Ct FROM #SpeedRacer;
SELECT SUM(CONVERT(INT, isActiveBit)) AS Ct FROM #SpeedRacer;
--
 SQL Server Execution Times:
   CPU time = 140 ms,  elapsed time = 128 ms.

 SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 125 ms.        

I ran the process several times and saw no significant difference between the two. How about with INTs?

SET STATISTICS TIME ON;

SELECT COUNT(NULLIF(isActiveInt,0)) AS Ct FROM #SpeedRacer;
SELECT SUM(isActiveInt) AS Ct FROM #SpeedRacer;
--
 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 118 ms.

 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 108 ms.        

Similar to counting BITs, there isn't much difference between the two. In a previous post I was getting a 5x performance boost by using COUNT vs SUM. In this instance that isn't the case. Is there anything in the execution plans that might give us any clues as to what is going on behind the scenes?


Article content

Looks like the same query plans for all but the last SUM, since no additional computations had to be taken on the "isActiveInt" column to SUM 1s and 0s. That also happens to be our winner at 108 ms to complete. Does changing our bit field to allow NULLs and then changing all the 0s to 1s make any difference when COUNTing?

I'll prep the table and try COUNTing BITs and see what happens.

ALTER TABLE #SpeedRacer ALTER COLUMN isActiveBit BIT NULL;
UPDATE #SpeedRacer SET isActiveBit = NULL WHERE isActiveBit = 0;

SELECT COUNT(isActiveBit) FROM #SpeedRacer;
--
 SQL Server Execution Times:
   CPU time = 110 ms,  elapsed time = 112 ms.        

We get a slight speed boost. Not quite as fast as the SUM operation on INT, but there's also a different number of rows. To compare apples to apples, we'll do one last test.

ALTER TABLE #SpeedRacer ALTER COLUMN isActiveint BIT NULL;
UPDATE #SpeedRacer SET isActiveInt = NULL WHERE isActiveInt = 0;

SELECT COUNT(isActiveInt) FROM #SpeedRacer;
--
 SQL Server Execution Times:
   CPU time = 109 ms,  elapsed time = 112 ms.        

Today our winner is SUM. Last week the winner was COUNT. What I gather from these tests is that while there may not be a significant difference between SUM and COUNT for this particular operation, there is a measurable performance gain by avoiding additional computations within the COUNT and SUM expressions. Is the additional storage required by an INT field worth saving a few milliseconds? How will your application respond to NULL and 1 instead of 0 and 1 on a boolean result set?

To view or add a comment, sign in

More articles by Ricardo Oliva

  • SSMS 21 Preview 5 is out

    I just installed it, and haven't used it, but here are my initial thoughts - I was on Preview 2.1.

  • The real significance of COUNT(*) vs COUNT(1)

    I wrote a blog post a couple of weeks ago on this topic but based on the comments I received there was a bit of…

    4 Comments
  • SQL SERVER: Don't name primary keys in temp tables

    Temp tables are great for storing temporary data without affecting other sessions. But there's one big GOTCHA that you…

  • SQL Server 2022 Feature: IS [NOT] DISTINCT FROM

    I'm not gonna lie, reading "IS [NOT] DISTINCT FROM" does make my brain hurt just a little bit. My first question when…

  • ChatGPT ≠ SQL DBA

    ChatGPT is like having a really smart friend that’s confident they’re always right. Sometimes it's spot on and…

    3 Comments
  • SSMS 20.1 not working?

    client reached out and said their new version of SQL Server Management Studio (SSMS) didn’t work so they went back to…

  • What's taking so long? Querying msdb.dbo.sysjobhistory on SQL Server

    I recently had a client ask me to provide the average runtimes of individual SQL Agent job steps. This can be trickier…

  • Imposter Syndrome?

    Imposter syndrome (IS) is an overwhelming feeling that you’re in way over your head and it is simply a matter of time…

    10 Comments

Insights from the community

Others also viewed

Explore topics