Thursday, March 29, 2012

Duplicate strings

I have a column called professional_degrees in my table
and it has the following data

professional_degrees

bs,ms,mba
bs,mba,
bs,ms,bs,ms
mba,mba
bs,mba

In the above data u can see some of the degrees are repeated.So how can i find them and delete them from the table.

Thanks.You have to normalize you data if you can - otherwise use charindex and replace functions for updating. I do not envy you ;) in last case.|||I was talking abt the duplicates in the same string,not the duplicate recs in database.|||Do you know if they all are comma delimited?

In any event, you'll have to do something like

Parse out every row, 1 by 1, into n rows into a temp table
Then SELECT Distinct
Then INSERT into a new table
The UPDATE the original

probably need a function for the parsing...

just a guess...|||Ok, borrowing from a previous post (http://www.dbforums.com/t997070.html), you can get the parser. You could then create a "glue distinct" function something like:CREATE FUNCTION dbo.fGlue(@.original VARCHAR(8000), @.splitter VARCHAR(8000))
RETURNS VARCHAR(8000) AS BEGIN

DECLARE @.c VARCHAR(8000)

SELECT @.c = Coalesce(@.c + @.splitter + item, item)
FROM dbo.fSplit(@.original, @.splitter)
GROUP BY item

RETURN @.c
END
GO

SELECT dbo.fGlue('bs,ms,phd,ms,ms,ms,MCSE', ',')-PatP|||I was talking abt the duplicates in the same string,not the duplicate recs in database.
A database is (technically) normalized if it has no repeating groups - all data in fields are atomic.|||SELECT @.c = Coalesce(@.c + @.splitter + item, item)
FROM dbo.fSplit(@.original, @.splitter)
GROUP BY item

I couldn't understand what this query does,partularly with the following line

FROM dbo.fSplit(@.original, @.splitter)

WHy did u put @.original,@.splitter in the braces after the table name.What does ti do.
Thanks.|||dbo.fSplit is the table-valued function that was from the previous posting the I referenced. You need them both (dbo.fSplit and dbo.fGlue) to make this work.

-PatP|||I answered something similar over here

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=35121|||SELECT dbo.fGlue('bs,ms,phd,ms,ms,ms,MCSE', ',')

WHEn i run the above query it returned the value 'bs,MCSE,ms,phd'
But I want that to be as 'bs,ms,phd,mcse'

Thanks.|||Can you give me every possible notation you might use, in the order that you want to see them presented?

-PatP|||Better yet, here's a solution that will alllow you to specify which degrees you want presented, and the order that you want them presented! Just put them into the tGlue table, and set the seq column values based on the order that you want the degrees to present. Any degrees not listed in tGlue will be listed in alphabetical order after the degrees that are listed in tGlue.DROP FUNCTION dbo.fSplit
GO
-- ptp 20040507 Split a string on a separator, like VB

CREATE FUNCTION dbo.fSplit(
@.pcString VARCHAR(8000)
, @.pcSep VARCHAR(8000) = ','
) RETURNS @.r TABLE (
item VARCHAR(8000)
) AS
BEGIN

DECLARE @.i INT

SET @.i = CharIndex(@.pcSep, @.pcString)

WHILE 0 <> @.i
BEGIN
INSERT INTO @.r (item) SELECT Left(@.pcString, @.i - 1)
SET @.pcString = SubString(@.pcString, @.i + Len(@.pcSep), 8000)
SET @.i = CharIndex(@.pcSep, @.pcString)
END

INSERT INTO @.r (item) SELECT @.pcString

RETURN
END
GO
DROP TABLE tGlue
GO

CREATE TABLE tGlue(
seq INT
CONSTRAINT XPKtGlue PRIMARY KEY (seq)
, abbrev VARCHAR(900)
CONSTRAINT XAK01tGlue UNIQUE (abbrev)
)

INSERT INTO tGlue (seq, abbrev) -- you need to decide what
SELECT 1000, 'aa' -- abbreviations you want
UNION ALL SELECT 1100, 'bs' -- and what sequence you
UNION ALL SELECT 1200, 'ba' -- want them presented
UNION ALL SELECT 1300, 'ms'
UNION ALL SELECT 1400, 'ma'
UNION ALL SELECT 1500, 'mba'
UNION ALL SELECT 1600, 'phd'
GO
DROP FUNCTION dbo.fGlue
GO
CREATE FUNCTION dbo.fGlue(@.original VARCHAR(8000), @.splitter VARCHAR(8000))
RETURNS VARCHAR(8000) AS BEGIN

DECLARE @.c VARCHAR(8000)

SELECT @.c = Coalesce(@.c + @.splitter + item, item)
FROM (SELECT TOP 100 PERCENT item, Coalesce(seq, 2147483647) AS s2
FROM dbo.fSplit(@.original, @.splitter)
LEFT OUTER JOIN tGlue
ON (abbrev = item)
GROUP BY item, seq
ORDER BY 2, 1) AS a

RETURN @.c
END
GO

SELECT dbo.fGlue('bs,ms,phd,ms,ms,ms,MCSE', ',')-PatP

No comments:

Post a Comment