T-SQL dialect of SQL doesn’t have aggregate functions for strings, but there is an easy workaround using magic of XML.
Consider Employees table of the Northwind database. When I run following query:
SELECT Country, FirstName FROM Employees ORDER BY Country, FirstName
I get following result:
Country FirstName UK Anne UK Michael UK Robert UK Steven USA Andrew USA Janet USA Laura USA Margaret USA Nancy
Now I want to combine first names into comma separated strings grouped by country.Take a look at following SQL code:
SELECT DISTINCT Country, (STUFF((SELECT ',' + FirstName FROM Employees E2 WHERE E1.Country = E2.Country ORDER BY FirstName FOR XML PATH(''), TYPE, ROOT).value('root','nvarchar(max)'),1,1,'')) as FirstNames FROM Employees E1 ORDER BY Country
The internal subquery is the important part. The FOR XML PATH clause combines data into a single output, the ,TYPE, ROOT statements make sure that the output is of XML type with a root node. And the .value method converts output into varchar to display characters like “<” or “>” correctly. The result:
Country FirstNames UK Anne,Michael,Robert,Steven USA Andrew,Janet,Laura,Margaret,Nancy
UPDATE 4/25/2012: If your task is opposite – you need to ungroup “deaggregate” string values into separate rows, there is
an app for that a solution that does just that.
Great, thanks. I ‘m preparing to use .Net CLR when suddenly find your post. It ‘s much simpler.
Glad it helped. CLR offer a lot of flexibility, but it’s still a pain to implement it in SQL Server, so I try whenever possible to achieve a goal with pure TSQL
Thanks! Thats magic. Saved me a lot of trouble.
Its, realy great full solution, i have changed accordingly.
SELECT DISTINCT ReportName,
(STUFF((SELECT ‘,’ + Cast(lngCategory_ID as varchar(20)) FROM tblMCategory WHERE lngCategory_TypeID in (Select Item From SplitCommaString(TypeIDs))
FOR XML PATH(”), TYPE, ROOT).value(‘root’,’nvarchar(max)’),1,1,”))as CategoryIDs
FROM MenuNodes Where TypeIDs is not null
Thanks, I was wondering how the hell I would do that but you just simplified my task.
Very clever. Works like a charm.
can you explain to me “.value” function and its arguments.
@behrooz, it’s a method that extracts value from XML datatype. 1st parameter is XPath to desired element/attribute, second is TSQL datatype to bring the data as.
This Rocks! The old query was using a cursor, that took around 20 minutes. This one runs for 10 seconds!
could you please let me know how do I execute a Transact SQL? I have SQL server 2005 installed on my machine. If I cannot execute a T-SQL using SQL Server 2005, please let me know how do I accomplish using SQL. Thanks and I appreciate your help
Transact SQL is just a Microsoft dialect of SQL. In SQL Server 2005 you can execute T-SQL statements in Management Studio in new query window.
Excellent work!! Thank you!
When you’re working over a larger set of data, you’ll probably want to add in a GROUP BY clause and get rid of the DISTINCT, otherwise the XML query will be executed for every row, only for the duplicates to be thrown away:
(STUFF((SELECT ‘,’ + FirstName FROM Employees E2 WHERE E1.Country = E2.Country ORDER BY FirstName
FOR XML PATH(”), TYPE, ROOT).value(‘root’,’nvarchar(max)’),1,1,”)) as FirstNames
FROM Employees E1
GROUP BY E1.Country
ORDER BY Country
Great script…saved me a whole lot of time.
The Group By clause also made a big difference on the performance of the query..thanks Mike!
@Mike Goatly, strangely enough in my case (SQL Server 2008 SP2, query returns ~ 10,000 unique records) DISTINCT worked faster than GROUP BY. The only addition – I insert results of the query into a temporary table. Any idea why difference in speed? Thanks!
I modified and execute your query at SQL Server 2005 but always notice some error like “Incorrect syntax near ‘XML”.
(STUFF((select taved_approved_id+’,’ from t_approval_approved APV2
where APV2.taved_trans_number = APV1.taved_trans_number
FOR XML PATH(”), TYPE, ROOT).value(‘root’,’nvarchar(max)’),1,1,”)) as Tgl
from t_approval_approved APV1
group by taved_trans_number
Could you help me? Thank you
Are you sure it’s not SQL Server 2000? Support for this syntax began since 2005 version.
My database is on SQL Server 2000
But I executed this query via SQL Server 2005 (SQL Server Management Studio).
Is it possible ?
Management Studio is just a front end, but the query is executed by back-end database server. So your Management Studio can even be 2008, but if you connect to SQL Server 2000 – unfortunately this XML query will not work.
One solution – if you have access to actual 2005 server – you can attach your 2000 database to 2005 server – in this case the query will work, but the database will not be compatible with 2000 server anymore.
I see. Thanks
Hi Yuriy – I’ve just tried it on another set of data with about 400000 rows (naturally it’s different to yours, but hey, it’s some data I had to hand) and I found that the GROUP BY version tool 3 seconds and the DISTINCT version took over 10 minutes (I had to kill it at that point because it was taking so long!).
What are the indexes you have on the table? If the country column isn’t indexed, that might account for the differences.
Does anybody know how to do this with Linq? C#? This is exactly what I need, in my c# app, but I don’t want to use ADO as I am using EF and uptil now have done everything on client, however I am having difficulty implementing this on client, should be other way around.
@RickN: Here is an example in VB.NET, but it should be easy to convert it to C#:
@Mike Goatly: Turned out my production query is a bit more complex, I can’t use GROUP BY since some elements of internal query cannot be in GROUP BY clause so only DISTINCT would work for me
This is gr8 query I could deliver the reuslt to client after in few minutes after coming across this query.
Great Query! I Injoyed really . Thanks a alot Yuriy!
This is one of the best answer so far, I have read online. Just useful information. Very well presented. I had found another good post with wonderful explanation on how to use aggregate function in sql server over the internet.
please check out this link…
Thanks a million! This page must be kept up to assist our children who may become developers in future
Excellent idea. Of course the overhead of creating a DOM for every row is crazy, but since t-sql doesn’t have string aggregation, this is a life saver.
Great thinking out of the box!
How will this work on sybase ?
Thanks so much! I was trying to think of a slick way to do this that didn’t involve loops and this works perfectly. 🙂
>>@ Mike Goatly – Me too, I already had a count field going on in the query and you can’t group by one of these STUFF constructs… trying the Distinct method now.