T-SQL String Aggregate in SQL Server

T-SQL dialect of SQL doesn’t have aggregate functions for strings, but there is an easy workaround using magic of XML.

Consider Employees table of the Northwind database. When I run following query:

SELECT Country, FirstName FROM Employees ORDER BY Country, FirstName

I get following result:

Country FirstName
UK      Anne
UK      Michael
UK      Robert
UK      Steven
USA     Andrew
USA     Janet
USA     Laura
USA     Margaret
USA     Nancy

Now I want to combine first names into comma separated strings grouped by country.Take a look at following SQL code:

SELECT DISTINCT Country,

(STUFF((SELECT ',' + FirstName FROM Employees E2 WHERE  E1.Country = E2.Country	ORDER BY FirstName
FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'')) as FirstNames

FROM Employees E1 ORDER BY Country

The internal subquery is the important part. The FOR XML PATH clause combines data into a single output, the ,TYPE, ROOT statements make sure that the output is of XML type with a root node. And the .value method converts output into varchar to display characters like “<” or “>” correctly. The result:

Country  FirstNames
UK       Anne,Michael,Robert,Steven
USA      Andrew,Janet,Laura,Margaret,Nancy

UPDATE 4/25/2012: If your task is opposite – you need to ungroup “deaggregate” string values into separate rows, there is an app for that a solution that does just that.

33 replies on “T-SQL String Aggregate in SQL Server”

  1. Great, thanks. I ‘m preparing to use .Net CLR when suddenly find your post. It ‘s much simpler.

  2. @Joseph
    Glad it helped. CLR offer a lot of flexibility, but it’s still a pain to implement it in SQL Server, so I try whenever possible to achieve a goal with pure TSQL

  3. Its, realy great full solution, i have changed accordingly.

    SELECT DISTINCT ReportName,
    (STUFF((SELECT ‘,’ + Cast(lngCategory_ID as varchar(20)) FROM tblMCategory WHERE lngCategory_TypeID in (Select Item From SplitCommaString(TypeIDs))

    FOR XML PATH(”), TYPE, ROOT).value(‘root[1]’,’nvarchar(max)’),1,1,”))as CategoryIDs
    FROM MenuNodes Where TypeIDs is not null

  4. Thanks, I was wondering how the hell I would do that but you just simplified my task.

  5. @behrooz, it’s a method that extracts value from XML datatype. 1st parameter is XPath to desired element/attribute, second is TSQL datatype to bring the data as.

  6. This Rocks! The old query was using a cursor, that took around 20 minutes. This one runs for 10 seconds!

    Thanks alot!

  7. could you please let me know how do I execute a Transact SQL? I have SQL server 2005 installed on my machine. If I cannot execute a T-SQL using SQL Server 2005, please let me know how do I accomplish using SQL. Thanks and I appreciate your help

  8. @Pradi
    Transact SQL is just a Microsoft dialect of SQL. In SQL Server 2005 you can execute T-SQL statements in Management Studio in new query window.

  9. When you’re working over a larger set of data, you’ll probably want to add in a GROUP BY clause and get rid of the DISTINCT, otherwise the XML query will be executed for every row, only for the duplicates to be thrown away:

    SELECT Country,
    (STUFF((SELECT ‘,’ + FirstName FROM Employees E2 WHERE E1.Country = E2.Country ORDER BY FirstName
    FOR XML PATH(”), TYPE, ROOT).value(‘root[1]’,’nvarchar(max)’),1,1,”)) as FirstNames
    FROM Employees E1
    GROUP BY E1.Country
    ORDER BY Country

  10. Great script…saved me a whole lot of time.

    The Group By clause also made a big difference on the performance of the query..thanks Mike!

  11. @Mike Goatly, strangely enough in my case (SQL Server 2008 SP2, query returns ~ 10,000 unique records) DISTINCT worked faster than GROUP BY. The only addition – I insert results of the query into a temporary table. Any idea why difference in speed? Thanks!

  12. I modified and execute your query at SQL Server 2005 but always notice some error like “Incorrect syntax near ‘XML”.

    My query:
    select taved_trans_number,
    (STUFF((select taved_approved_id+’,’ from t_approval_approved APV2
    where APV2.taved_trans_number = APV1.taved_trans_number
    FOR XML PATH(”), TYPE, ROOT).value(‘root[1]’,’nvarchar(max)’),1,1,”)) as Tgl
    from t_approval_approved APV1
    group by taved_trans_number

    Could you help me? Thank you

  13. Are you sure it’s not SQL Server 2000? Support for this syntax began since 2005 version.

  14. Oh..
    My database is on SQL Server 2000
    But I executed this query via SQL Server 2005 (SQL Server Management Studio).
    Is it possible ?

  15. Management Studio is just a front end, but the query is executed by back-end database server. So your Management Studio can even be 2008, but if you connect to SQL Server 2000 – unfortunately this XML query will not work.
    One solution – if you have access to actual 2005 server – you can attach your 2000 database to 2005 server – in this case the query will work, but the database will not be compatible with 2000 server anymore.

  16. Hi Yuriy – I’ve just tried it on another set of data with about 400000 rows (naturally it’s different to yours, but hey, it’s some data I had to hand) and I found that the GROUP BY version tool 3 seconds and the DISTINCT version took over 10 minutes (I had to kill it at that point because it was taking so long!).

    What are the indexes you have on the table? If the country column isn’t indexed, that might account for the differences.

  17. Does anybody know how to do this with Linq? C#? This is exactly what I need, in my c# app, but I don’t want to use ADO as I am using EF and uptil now have done everything on client, however I am having difficulty implementing this on client, should be other way around.

  18. This is gr8 query I could deliver the reuslt to client after in few minutes after coming across this query.

  19. Thanks a million! This page must be kept up to assist our children who may become developers in future

  20. Excellent idea. Of course the overhead of creating a DOM for every row is crazy, but since t-sql doesn’t have string aggregation, this is a life saver.

    Great thinking out of the box!

  21. Thanks so much! I was trying to think of a slick way to do this that didn’t involve loops and this works perfectly. 🙂

  22. >>@ Mike Goatly – Me too, I already had a count field going on in the query and you can’t group by one of these STUFF constructs… trying the Distinct method now.

Leave a Reply

Your email address will not be published. Required fields are marked *