UPDATE STATISTICS and hints in SQL Server

I was working today on a slow query in SQL Server — it was a simple query on a well-indexed table, and I
could not initially see why it would be so slow. So I did some tests, and was surprised by the results.

I have reproduced the situation I was working through with a test table, with the following structure:

CREATE TABLE HintTest (
 HintTestID INT NOT NULL IDENTITY(1,1),
 Col1 INT NULL,
 Col2 INT NULL,
 Col3 INT NULL,
 CONSTRAINT PK_HintTest PRIMARY KEY (HintTestID)
);

CREATE NONCLUSTERED INDEX HintTest_Col1 ON HintTest (Col1)
-- column 2 was not indexed
CREATE NONCLUSTERED INDEX HintTest_Col3 ON HintTest (Col3)

Then I copied a set of records from the original data I was working with. The following code will
not reproduce this (and you’ll see the reason later) but does very roughly mimic the distribution of the data:

I used a quick and dirty test harness:

DECLARE @v INT, @StartTime DATETIME
SET @StartTime = GETDATE()
SET @v = 0

DECLARE @prmRecordFound BIT, @prmCol1 INT, @prmCol2 INT, @prmCol3 INT

SET @prmCol1 = 50
SET @prmCol2 = 3
SET @prmCol3 = 1750

WHILE @v < 10000
BEGIN
 SET @v = @v + 1

 -- Test Case Here
END

PRINT DATEDIFF(ms, @StartTime, GETDATE())

And here are the tests I wrote:

  -------------------------------------------
  -- TEST CASE 1: SELECT primary key and separate IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @HintTestID INT

  SELECT TOP 1 @HintTestID = HintTestID
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3

  IF @HintTestID IS NOT NULL
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 2: SELECT COUNT and separate IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  SELECT @Count = COUNT(*)
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3

  IF @Count > 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 3: SELECT COUNT nested in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  IF (SELECT COUNT(*)
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3) > 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 4: SELECT COUNT with hint nest in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  IF (SELECT COUNT(*)
  FROM
    HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3) > 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 5: EXISTS SELECT * in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  IF EXISTS(SELECT *
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3)
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 6: EXISTS SELECT * with hint in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  IF EXISTS(SELECT *
  FROM
    HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3)
    SET @prmRecordFound = 1

The first run results reproduced the situation quite well, and returned the following surprising statistics (10,000 iterations):

1 (SELECT primary key and separate IF)  846 ms
2 (SELECT COUNT and separate IF)        203 ms
3 (SELECT COUNT nested in IF)           3523 ms
4 (SELECT COUNT with hint nested in IF) 226 ms
5 (EXISTS SELECT * in IF)               3460 ms
6 (EXISTS SELECT * with hint in IF)     263 ms

I was puzzled why there would be such a difference between cases 2 and 3, given that they were so similar,
so I looked at the execution plan for the query. It turns out that the query optimizer was selecting a very
non-optimal plan (a clustered index scan) when the SELECT statement was nested in an IF statement, whereas for
case 2 it was using two index seeks followed by a merge join. Adding hints overrode the query optimizer, and
the results can be seen in cases 4 and 6.

I ran the following statement to refresh the data the query optimizer uses:

UPDATE STATISTICS HintTest

UPDATE STATISTICS will update information about the distribution of keys for indexes on the table, which is then used by the query optimizer. Then I re-ran the test (100,000 iterations shown below). Dramatic difference. Now the hinted queries were among the slowest:

1 (SELECT primary key and separate IF)  2266 ms
2 (SELECT COUNT and separate IF)        2313 ms
3 (SELECT COUNT nested in IF)           2500 ms
4 (SELECT COUNT with hint nested in IF) 2546 ms
5 (EXISTS SELECT * in IF)               2656 ms
6 (EXISTS SELECT * with hint in IF)     2706 ms

For me, the big lesson learned out of this was this:

Always run UPDATE STATISTICS before trying to optimize a query!

The second thing I learned was that the fastest query is often unexpected. I would have expected the
EXISTS condition (case 5) to be optimal for a simple boolean result but instead the fastest query
was consistently case 1 – the SELECT primary key method.

Leave a Reply

Your email address will not be published. Required fields are marked *