Concatenating strings in SQL Server, or undefined behaviour by design

We just ran into a funny problem here, using a “tried and true” technique in SQL Server to concatenate strings. I use the quotes advisedly. This technique is often suggested on blogs and sites such as Stack Overflow, but we found out (by painful experience) that it is not to be relied on.

Update, 9 Mar 2016: Bruce Gordon from Webucator has turned this into a great little 5 minute video. Thanks Bruce! I don’t know anything much about Webucator, but they are doing some good stuff with creating well-attributed videos about blog posts such as this one and apparently they do SQL Server training.

The problem

So, given the following setup:

CREATE TABLE BadConcat (
  BadConcatID INT NOT NULL,
  Description NVARCHAR(100) NOT NULL,
  SortIndex INT NOT NULL
  CONSTRAINT PK_BadConcat PRIMARY KEY CLUSTERED (BadConcatID)
)
GO

INSERT BadConcat 
  SELECT 1, 'First Item', 1 union all
  SELECT 2, 'Second Item', 2 union all
  SELECT 3, 'Third Item', 3
GO

We need to concatenate those Descriptions. I have avoided fine tuning such as dropping the final comma or handling NULLs for the purpose of this example. This example shows one of the most commonly given answers to the problem:

DECLARE @Summary NVARCHAR(100) = ''

SELECT @Summary = @Summary + ec.Description + ', '
FROM BadConcat ec
ORDER BY ec.SortIndex 

PRINT @Summary

And we get the following:

First Item, Second Item, Third Item, 

And that works fine. However, if we want to include a WHERE clause, even if that clause still selects everything, then we suddenly get something weird:

SET @Summary = ''

SELECT @Summary = @Summary + ec.Description + ', '
FROM BadConcat ec
WHERE ec.BadConcatID in (1,2,3)
ORDER BY ec.SortIndex 

PRINT @Summary

Now we get the following:

Third Item, 

What? What has SQL Server done? What’s happened to the first two items?

You’ll probably do what we did, which is to go through and make sure that you are selecting everything properly, which we are, and eventually come to the conclusion that “there must be a bug in SQL Server”.

The answer

It turns out that this iterative concatenation is unsupported functionality. Microsoft Knowledge Base article 287515 states:

You may encounter unexpected results when you apply any operators or expressions to the ORDER BY clause of aggregate concatenation queries.

Now, at first glance that does not directly apply. But we can extrapolate from that, as Microsoft developer support have done, in response to a bug report on SQL Server, to learn that:

The variable assignment with SELECT statement is a proprietary syntax (T-SQL only) where the behavior is undefined or plan dependent if multiple rows are produced

And again, in response to another bug report:

Using assignment operations (concatenation in this example) in queries with ORDER BY clause has undefined behavior. This can change from release to release or even within a particular server version due to changes in the query plan. You cannot rely on this behavior even if there are workarounds.

Some alternative solutions are given, also, in that second report:

The ONLY guaranteed mechanism are the following:

1. Use cursor to loop through the rows in specific order and concatenate the values
2. Use for xml query with ORDER BY to generate the concatenated values
3. Use CLR aggregate (this will not work with ORDER BY clause)

And the article “Concatenating Row Values in Transact-SQL” by Anith Sen goes through some of those solutions in detail. Sadly, none of them are as clean or as easy to understand as that original example.

Another example is given on Stack Overflow, which details how to safely use XML PATH to concatenate, without breaking on the XML special characters &, < and >. Applying that example into my problem code given above, we should use the following:

SELECT @Summary = (
  SELECT ec.Description + ', ' 
  FROM BadConcat ec 
  WHERE ec.BadConcatID in (1,2,3)
  ORDER BY ec.SortIndex 
  FOR XML PATH(''), TYPE
).value('.','varchar(max)')

PRINT @Summary

Voilà.

First Item, Second Item, Third Item, 

7 thoughts on “Concatenating strings in SQL Server, or undefined behaviour by design

  1. Yes, this is a much better way to do string concatenation in SQL. And then, to remove that pesky comma at the end of the string, we can use the STUFF command 🙂

    DECLARE @summary VARCHAR(MAX)
    DECLARE @delim VARCHAR(10) = ‘, ‘

    SELECT @summary = STUFF(
    (SELECT @delim + ec.Description
    FROM BadConcat ec
    WHERE ec.BadConcatID IN (1, 2, 3)
    ORDER BY ec.SortIndex
    FOR XML PATH (”))
    , 1, DATALENGTH(@delim), ”)

    PRINT @summary

  2. Just spent a few hours trying to figure out why my code was broken. Worked in one query but not the other. Thanks for writing this up. I wish M$ would support this feature. Everyone uses it. I’m surprised that it isn’t actually supported!

  3. Thanks for this Marc. Was brought to this article from your StackOverflow comment :
    https://stackoverflow.com/questions/194852/how-to-concatenate-text-from-multiple-rows-into-a-single-text-string-in-sql-serv

    Another commentator on this same thread mentioned that the += concatenation is considerably slower than XML Path … when the number of items to concatenate gets large (according to the commentator, James L., for 20,000 items: using the concatenate method – 2 minutes. Using the XML path method, it took less than 1 second

    However, if you have,
    a) a much smaller number of items to join than 20,000 and
    b) No where or order by clause

    Are there still cases where we can expect inexplicable behavior?

    1. Thanks for reading, Michele! Yes, by design the behavior is not specified, so updated query plans may cause changes to the concatenation behavior. This can also happen with updated versions of SQL Server.

      1. Thanks Marc.

        As you mentioned, apparently many people use this method, and appear to be able to use it successfully, and the fact that it is unreliable is just not understood.

        I think one problem is that we can follow an example of the concatenation method, such as your initial working example, and get the results you get.

        I suppose some people don’t spend a ton of time thinking of execution plans and/or are not well-versed in them, – and I admit I’m one of them. Aside from SQL Server upgrades, how can the “working” example of concatenation (with no where clause) have its plan changed, so that it will fail? What are common ways this kind of change might happen?

        – > Not just questions for Marc… I’d like to know if anyone has any ideas, or if anyone gets different results than Marc on the concatenation examples.

  4. The best information I had was in those MS articles linked in the blog post — and those have now disappeared into the Recycle Bin of connect.microsoft.com so they are no longer accessible. Very sad. They do not even appear to be on the Wayback Machine.

    Microsoft also relocated all their blogs not that long ago and broke all those links, and some helpful content on this that was there is no longer indexed as far as I can tell.

    No good answers from me…

Leave a Reply

Your email address will not be published. Required fields are marked *