Monday, March 26, 2012

Duplicate record removal

I can identify the duplicate rows and show number of duplication by using
this
SELECT [USER_NUMBER], [EMAIL], Count([USER_NUMBER]) AS NUM
FROM USERS
GROUP BY [USER_NUMBER], [EMAIL]
HAVING Count([USER_NUMBER]) > 1
I can't figure out how to delete duplicate records and keep one distinct
record with smallest ID
Example:
TABLE: USERS
ID USER_NUMBER EMAIL
1 12 a@.a.com
2 12 a@.a.com
3 13 b@.b.com
4 14 c@.c.com
5 14 c@.c.com
Output:
ID USER_NUMBER EMAIL
1 12 a@.a.com
3 13 b@.b.com
4 14 c@.c.com
Thanks,
HowardDELETE FROM users
WHERE EXISTS
(SELECT *
FROM users AS U
WHERE U.user_number = users.user_number
AND U.email = users.email
AND U.id < users.id) ;
Nulls, if any, will be ignored.
David Portas
SQL Server MVP
--|||Hi Howard,
Try this statement
Delete From USERS
Where [USER_NUMBER] NOT In
(Select [USER_NUMBER] From USERS Users_Out Where [USER_NUMBER]
IN (Select Min(Users_In.[USER_NUMBER]) From USERS Users_In Where
Users_In.[EMAIL] = Users_Out.[EMAIL]
)
)
PLEASE NOTE : I have not tested this statement so please run the following
query and ensure that this is what you want to be deleted.
Select * From USERS
Where [USER_NUMBER] NOT In
(Select [USER_NUMBER] From USERS Users_Out Where [USER_NUMBER]
IN (Select Min(Users_In.[USER_NUMBER]) From USERS Users_In Where
Users_In.[EMAIL] = Users_Out.[EMAIL]
)
)
Do mail if this post help s you.
Vishal Khajuria
9886170165
IBM Bangalore
"Howard" wrote:

> I can identify the duplicate rows and show number of duplication by using
> this
> SELECT [USER_NUMBER], [EMAIL], Count([USER_NUMBER]) AS NUM
> FROM USERS
> GROUP BY [USER_NUMBER], [EMAIL]
> HAVING Count([USER_NUMBER]) > 1
>
> I can't figure out how to delete duplicate records and keep one distinct
> record with smallest ID
> Example:
> TABLE: USERS
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 2 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> 5 14 c@.c.com
>
> Output:
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> Thanks,
> Howard
>
>
>|||thanks for your replies
I solved the problem.
"Howard" <howdy0909@.yahoo.com> wrote in message
news:OlFbAlQ6FHA.3588@.TK2MSFTNGP15.phx.gbl...
>I can identify the duplicate rows and show number of duplication by using
>this
> SELECT [USER_NUMBER], [EMAIL], Count([USER_NUMBER]) AS NUM
> FROM USERS
> GROUP BY [USER_NUMBER], [EMAIL]
> HAVING Count([USER_NUMBER]) > 1
>
> I can't figure out how to delete duplicate records and keep one distinct
> record with smallest ID
> Example:
> TABLE: USERS
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 2 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> 5 14 c@.c.com
>
> Output:
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> Thanks,
> Howard
>
>|||Hi,
This will solve your query ...
delete from test where test.id not in (select top 1 b.id from test b
where (test.user_number=b.user_number) and
(test.email=b.email))
Regards,
Predrag Stojanovic
Analist/Programer
"Howard" <howdy0909@.yahoo.com> wrote in message
news:OlFbAlQ6FHA.3588@.TK2MSFTNGP15.phx.gbl...
> I can identify the duplicate rows and show number of duplication by using
> this
> SELECT [USER_NUMBER], [EMAIL], Count([USER_NUMBER]) AS NUM
> FROM USERS
> GROUP BY [USER_NUMBER], [EMAIL]
> HAVING Count([USER_NUMBER]) > 1
>
> I can't figure out how to delete duplicate records and keep one distinct
> record with smallest ID
> Example:
> TABLE: USERS
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 2 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> 5 14 c@.c.com
>
> Output:
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> Thanks,
> Howard
>
>|||You missed out the ORDER BY. TOP 1 may not retrieve the minimum value
of ID unless you use ORDER BY.
Also, notice that this version will not delete any rows for a
particular user_name if a NULL ID exists for that user_name. Probably
that's not likely - I'd guess that ID is the PRIMARY KEY - but that's
one potential catch with TOP and with NOT IN.
David Portas
SQL Server MVP
--|||Here is the process:
1. SELECT * INTO #Temp1 FROM [Table1]
2. TRUNCATE TABLE [Table1]
3. CREATE UNIQUE INDEX [Index1] ON [Table1] (Unique Column Names) WITH
IGNORE_DUP_KEY
4. INSERT INTO [Table1] (Column Names) SELECT (Column Names) FROM [Table1]
5. DROP INDEX [Table1].[Index1]
Substitute your own names for those within the [].
Unique Column Names are those columns that uniquely identify each row.
Column Names is the full list of columns in your table.
HTH,
Mike
"Howard" wrote:

> I can identify the duplicate rows and show number of duplication by using
> this
> SELECT [USER_NUMBER], [EMAIL], Count([USER_NUMBER]) AS NUM
> FROM USERS
> GROUP BY [USER_NUMBER], [EMAIL]
> HAVING Count([USER_NUMBER]) > 1
>
> I can't figure out how to delete duplicate records and keep one distinct
> record with smallest ID
> Example:
> TABLE: USERS
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 2 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> 5 14 c@.c.com
>
> Output:
> ID USER_NUMBER EMAIL
> 1 12 a@.a.com
> 3 13 b@.b.com
> 4 14 c@.c.com
> Thanks,
> Howard
>
>
>sql

No comments:

Post a Comment