MSSQL: XML and STUFF for strings

Say what now?

Have you ever been deep in database land and been wondering how to combine a few rows of like items into one row? If you answered “YES!” you’re in the right place (lucky you!)

If you’re confused, fear not … this might still be helpful for you, read on my friend.

The problem

Let’s say you have a table in your database full of addresses and account numbers. You’d like to de-duplicate these addresses, but need to know which account numbers are associated with each address.

That’s where XML and STUFF comes to the rescue … especially if you need the list of account numbers formatted for ingestion into a different program (comma separated, semicolon separated, etc.)

Our test data looks like this:

Which you can create in your own database using a temp table:

DECLARE @FakeAddressData TABLE(AccountNumber BIGINT, AddressLine VARCHAR(MAX))
INSERT INTO @FakeAddressData VALUES
(1235,'123 Floral Lane, Timberwood, NY 02657'),
(76,'123 Floral Lane, Timberwood, NY 02657'),
(96584,'123 Floral Lane, Timberwood, NY 02657'),
(6985,'5 Nothing St, Microsoft Way, WA 95642'),
(36587,'5 Nothing St, Microsoft Way, WA 95642'),
(23333,'5 Nothing St, Microsoft Way, WA 95642'),
(4444,'5 Nothing St, Microsoft Way, WA 95642'),
(597886,'900 Apply Rd, Cupertino, CA 45845'),
(555,'900 Apply Rd, Cupertino, CA 45845'),
(36,'7 Linux Fwy, Nowhere, MT 55555'),
(81,'7 Linux Fwy, Nowhere, MT 55555'),
(6123,'7 Linux Fwy, Nowhere, MT 55555'),
(6,'7 Linux Fwy, Nowhere, MT 55555'),
(98467,'7 Linux Fwy, Nowhere, MT 55555'),
(36587111,'7 Linux Fwy, Nowhere, MT 55555')

SELECT * FROM @FakeAddressData ORDER BY AccountNumber

Lovely … now let’s solve our problem.

The Solution

For those of you who don’t want to read any further (or are just impatient) … here is the solution (including the temp table from above), explained below:

DECLARE @FakeAddressData TABLE(AccountNumber BIGINT, AddressLine VARCHAR(MAX))
INSERT INTO @FakeAddressData VALUES
(1235,'123 Floral Lane, Timberwood, NY 02657'),
(76,'123 Floral Lane, Timberwood, NY 02657'),
(96584,'123 Floral Lane, Timberwood, NY 02657'),
(6985,'5 Nothing St, Microsoft Way, WA 95642'),
(36587,'5 Nothing St, Microsoft Way, WA 95642'),
(23333,'5 Nothing St, Microsoft Way, WA 95642'),
(4444,'5 Nothing St, Microsoft Way, WA 95642'),
(597886,'900 Apply Rd, Cupertino, CA 45845'),
(555,'900 Apply Rd, Cupertino, CA 45845'),
(36,'7 Linux Fwy, Nowhere, MT 55555'),
(81,'7 Linux Fwy, Nowhere, MT 55555'),
(6123,'7 Linux Fwy, Nowhere, MT 55555'),
(6,'7 Linux Fwy, Nowhere, MT 55555'),
(98467,'7 Linux Fwy, Nowhere, MT 55555'),
(36587111,'7 Linux Fwy, Nowhere, MT 55555')

SELECT
    FAD.AddressLine,
    STUFF((SELECT ',' + CAST(AccountNumber AS VARCHAR) FROM @FakeAddressData FAD_Sub WHERE FAD_Sub.AddressLine = FAD.AddressLine FOR XML PATH ('')), 1, 1, '') AS AccountNumbers,
    COUNT(FAD.AddressLine) AS NumberOfAccounts
FROM @FakeAddressData FAD
GROUP BY FAD.AddressLine
ORDER BY NumberOfAccounts DESC

The Explanation

Our base select is simple. It returns AddressLine and the count of how many AccountNumbers have that same AddressLine:

SELECT
    FAD.AddressLine,
    COUNT(FAD.AddressLine) AS NumberOfAccounts
FROM @FakeAddressData FAD
GROUP BY FAD.AddressLine
ORDER BY NumberOfAccounts DESC

The sneaky bit is this line:

STUFF((SELECT ',' + CAST(AccountNumber AS VARCHAR) FROM @FakeAddressData FAD_Sub WHERE FAD_Sub.AddressLine = FAD.AddressLine FOR XML PATH ('')), 1, 1, '') AS AccountNumbers,

Let’s take it apart, since it contains two different functions, XML and STUFF:

The XML portion is inside the STUFF function and does 99% of the work:

SELECT ',' + CAST(AccountNumber AS VARCHAR) FROM @FakeAddressData FAD_Sub WHERE FAD_Sub.AddressLine = FAD.AddressLine FOR XML PATH ('')

But you’ll get leading commas in your output, which doesn’t look so nice:

Note: If you want to change the comma to something else, simply change it right after the SELECT … for a semicolon you’d do:

SELECT ';' + CAST(AccountNumber AS VARCHAR) FROM @FakeAddressData FAD_Sub WHERE FAD_Sub.AddressLine = FAD.AddressLine FOR XML PATH ('')

Another note: This select can’t run independently of the main statement, since we’re referencing “FAD” and ensuring that each address line is matched so that we don’t get duplicate AccountNumber fields (ie. FAD_Sub.AddressLine = FAD.AddressLine).

Removing the leading comma is where STUFF comes in:

STUFF((SELECT ',' + CAST(AccountNumber AS VARCHAR) FROM @FakeAddressData FAD_Sub WHERE FAD_Sub.AddressLine = FAD.AddressLine FOR XML PATH ('')), 1, 1, '') AS AccountNumbers,

It takes one string and STUFFs it into another one (these devs are very clever in their wording, aren’t they?). But all we want the function to do is remove the leading comma.

The key is :

1, 1, '')

The first “1” is the location we want to start at in the string (our lovely FOR XML PATH).

The second “1” is the number of characters to replace … since a comma is one character, that’s all we need. (If you’re one of those people who likes to use 7 commas to delimit your strings … you’d change this number to … 7! But why would anyone, ever, do this?)

The last argument is what you’d like the replacement to be. In this case it’s … nothing! You could change it to something like ‘The account numbers that all have the same address: ‘ and you’d get:

Now you know how to use STUFF and XML to do literally anything you want (or maybe not …)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.