I was working today on a slow query in SQL Server — it was a simple query on a well-indexed table, and I
could not initially see why it would be so slow. So I did some tests, and was surprised by the results.
I have reproduced the situation I was working through with a test table, with the following structure:
CREATE TABLE HintTest (
HintTestID INT NOT NULL IDENTITY(1,1),
Col1 INT NULL,
Col2 INT NULL,
Col3 INT NULL,
CONSTRAINT PK_HintTest PRIMARY KEY (HintTestID)
);
CREATE NONCLUSTERED INDEX HintTest_Col1 ON HintTest (Col1)
-- column 2 was not indexed
CREATE NONCLUSTERED INDEX HintTest_Col3 ON HintTest (Col3)
Then I copied a set of records from the original data I was working with. The following code will
not reproduce this (and you’ll see the reason later) but does very roughly mimic the distribution of the data:
I used a quick and dirty test harness:
DECLARE @v INT, @StartTime DATETIME
SET @StartTime = GETDATE()
SET @v = 0
DECLARE @prmRecordFound BIT, @prmCol1 INT, @prmCol2 INT, @prmCol3 INT
SET @prmCol1 = 50
SET @prmCol2 = 3
SET @prmCol3 = 1750
WHILE @v < 10000
BEGIN
SET @v = @v + 1
-- Test Case Here
END
PRINT DATEDIFF(ms, @StartTime, GETDATE())
And here are the tests I wrote:
-------------------------------------------
-- TEST CASE 1: SELECT primary key and separate IF
-------------------------------------------
SET @prmRecordFound = 0
DECLARE @HintTestID INT
SELECT TOP 1 @HintTestID = HintTestID
FROM
HintTest
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3
IF @HintTestID IS NOT NULL
SET @prmRecordFound = 1
-------------------------------------------
-- TEST CASE 2: SELECT COUNT and separate IF
-------------------------------------------
SET @prmRecordFound = 0
DECLARE @Count INT
SELECT @Count = COUNT(*)
FROM
HintTest
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3
IF @Count > 0
SET @prmRecordFound = 1
-------------------------------------------
-- TEST CASE 3: SELECT COUNT nested in IF
-------------------------------------------
SET @prmRecordFound = 0
IF (SELECT COUNT(*)
FROM
HintTest
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3) > 0
SET @prmRecordFound = 1
-------------------------------------------
-- TEST CASE 4: SELECT COUNT with hint nest in IF
-------------------------------------------
SET @prmRecordFound = 0
IF (SELECT COUNT(*)
FROM
HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3) > 0
SET @prmRecordFound = 1
-------------------------------------------
-- TEST CASE 5: EXISTS SELECT * in IF
-------------------------------------------
SET @prmRecordFound = 0
DECLARE @Count INT
IF EXISTS(SELECT *
FROM
HintTest
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3)
SET @prmRecordFound = 1
-------------------------------------------
-- TEST CASE 6: EXISTS SELECT * with hint in IF
-------------------------------------------
SET @prmRecordFound = 0
DECLARE @Count INT
IF EXISTS(SELECT *
FROM
HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
WHERE
Col1 = @prmCol1 AND
Col2 = @prmCol2 AND
Col3 = @prmCol3)
SET @prmRecordFound = 1
The first run results reproduced the situation quite well, and returned the following surprising statistics (10,000 iterations):
1 (SELECT primary key and separate IF) 846 ms
2 (SELECT COUNT and separate IF) 203 ms
3 (SELECT COUNT nested in IF) 3523 ms
4 (SELECT COUNT with hint nested in IF) 226 ms
5 (EXISTS SELECT * in IF) 3460 ms
6 (EXISTS SELECT * with hint in IF) 263 ms
I was puzzled why there would be such a difference between cases 2 and 3, given that they were so similar,
so I looked at the execution plan for the query. It turns out that the query optimizer was selecting a very
non-optimal plan (a clustered index scan) when the SELECT statement was nested in an IF statement, whereas for
case 2 it was using two index seeks followed by a merge join. Adding hints overrode the query optimizer, and
the results can be seen in cases 4 and 6.
I ran the following statement to refresh the data the query optimizer uses:
UPDATE STATISTICS HintTest
UPDATE STATISTICS
will update information about the distribution of keys for indexes on the table, which is then used by the query optimizer. Then I re-ran the test (100,000 iterations shown below). Dramatic difference. Now the hinted queries were among the slowest:
1 (SELECT primary key and separate IF) 2266 ms
2 (SELECT COUNT and separate IF) 2313 ms
3 (SELECT COUNT nested in IF) 2500 ms
4 (SELECT COUNT with hint nested in IF) 2546 ms
5 (EXISTS SELECT * in IF) 2656 ms
6 (EXISTS SELECT * with hint in IF) 2706 ms
For me, the big lesson learned out of this was this:
Always run UPDATE STATISTICS
before trying to optimize a query!
The second thing I learned was that the fastest query is often unexpected. I would have expected the
EXISTS condition (case 5) to be optimal for a simple boolean result but instead the fastest query
was consistently case 1 – the SELECT primary key method.