I was working today on a slow query in SQL Server — it was a simple query on a well-indexed table, and I
could not initially see why it would be so slow. So I did some tests, and was surprised by the results.
I have reproduced the situation I was working through with a test table, with the following structure:
CREATE TABLE HintTest ( HintTestID INT NOT NULL IDENTITY(1,1), Col1 INT NULL, Col2 INT NULL, Col3 INT NULL, CONSTRAINT PK_HintTest PRIMARY KEY (HintTestID) ); CREATE NONCLUSTERED INDEX HintTest_Col1 ON HintTest (Col1) -- column 2 was not indexed CREATE NONCLUSTERED INDEX HintTest_Col3 ON HintTest (Col3)
Then I copied a set of records from the original data I was working with. The following code will
not reproduce this (and you’ll see the reason later) but does very roughly mimic the distribution of the data:
I used a quick and dirty test harness:
DECLARE @v INT, @StartTime DATETIME SET @StartTime = GETDATE() SET @v = 0 DECLARE @prmRecordFound BIT, @prmCol1 INT, @prmCol2 INT, @prmCol3 INT SET @prmCol1 = 50 SET @prmCol2 = 3 SET @prmCol3 = 1750 WHILE @v < 10000 BEGIN SET @v = @v + 1 -- Test Case Here END PRINT DATEDIFF(ms, @StartTime, GETDATE())
And here are the tests I wrote:
------------------------------------------- -- TEST CASE 1: SELECT primary key and separate IF ------------------------------------------- SET @prmRecordFound = 0 DECLARE @HintTestID INT SELECT TOP 1 @HintTestID = HintTestID FROM HintTest WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3 IF @HintTestID IS NOT NULL SET @prmRecordFound = 1 ------------------------------------------- -- TEST CASE 2: SELECT COUNT and separate IF ------------------------------------------- SET @prmRecordFound = 0 DECLARE @Count INT SELECT @Count = COUNT(*) FROM HintTest WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3 IF @Count > 0 SET @prmRecordFound = 1 ------------------------------------------- -- TEST CASE 3: SELECT COUNT nested in IF ------------------------------------------- SET @prmRecordFound = 0 IF (SELECT COUNT(*) FROM HintTest WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3) > 0 SET @prmRecordFound = 1 ------------------------------------------- -- TEST CASE 4: SELECT COUNT with hint nest in IF ------------------------------------------- SET @prmRecordFound = 0 IF (SELECT COUNT(*) FROM HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23)) WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3) > 0 SET @prmRecordFound = 1 ------------------------------------------- -- TEST CASE 5: EXISTS SELECT * in IF ------------------------------------------- SET @prmRecordFound = 0 DECLARE @Count INT IF EXISTS(SELECT * FROM HintTest WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3) SET @prmRecordFound = 1 ------------------------------------------- -- TEST CASE 6: EXISTS SELECT * with hint in IF ------------------------------------------- SET @prmRecordFound = 0 DECLARE @Count INT IF EXISTS(SELECT * FROM HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23)) WHERE Col1 = @prmCol1 AND Col2 = @prmCol2 AND Col3 = @prmCol3) SET @prmRecordFound = 1
The first run results reproduced the situation quite well, and returned the following surprising statistics (10,000 iterations):
1 (SELECT primary key and separate IF) 846 ms 2 (SELECT COUNT and separate IF) 203 ms 3 (SELECT COUNT nested in IF) 3523 ms 4 (SELECT COUNT with hint nested in IF) 226 ms 5 (EXISTS SELECT * in IF) 3460 ms 6 (EXISTS SELECT * with hint in IF) 263 ms
I was puzzled why there would be such a difference between cases 2 and 3, given that they were so similar,
so I looked at the execution plan for the query. It turns out that the query optimizer was selecting a very
non-optimal plan (a clustered index scan) when the SELECT statement was nested in an IF statement, whereas for
case 2 it was using two index seeks followed by a merge join. Adding hints overrode the query optimizer, and
the results can be seen in cases 4 and 6.
I ran the following statement to refresh the data the query optimizer uses:
UPDATE STATISTICS HintTest
UPDATE STATISTICS
will update information about the distribution of keys for indexes on the table, which is then used by the query optimizer. Then I re-ran the test (100,000 iterations shown below). Dramatic difference. Now the hinted queries were among the slowest:
1 (SELECT primary key and separate IF) 2266 ms 2 (SELECT COUNT and separate IF) 2313 ms 3 (SELECT COUNT nested in IF) 2500 ms 4 (SELECT COUNT with hint nested in IF) 2546 ms 5 (EXISTS SELECT * in IF) 2656 ms 6 (EXISTS SELECT * with hint in IF) 2706 ms
For me, the big lesson learned out of this was this:
Always run
UPDATE STATISTICS
before trying to optimize a query!
The second thing I learned was that the fastest query is often unexpected. I would have expected the
EXISTS condition (case 5) to be optimal for a simple boolean result but instead the fastest query
was consistently case 1 – the SELECT primary key method.