Tuesday, March 27, 2012

duplicate records

I am having a headache working on a huge database which contains 120000 records. the big problem is that the database contains duplicates of the "serial no" of products. Another important field is the "date_purchased". How do I remove the duplicates and retain the serial no with the latest date purchased? I have managed to remove duplicates but the program does not remove on the basis of the date_purchased.Hi,

i found some info's about duplicates on planet-source-code.com as i had same problems:

see attached file (too big to quote here)|||How about doing it in 2 stages ie:-

SELECT DISTINCT SerialNo,Max(DatePurchased) From tbl GROUP BY SerialNo

Then inner Join it to

SELECT * FROM tbl

U could create 1 Statement as a Permanent View or Do it all in a stored procedure & #tmp table

Should work -

U'l prob find it can all be done in one Statement - Some1 Post & I'll C which way is faster

GW|||Or a refinement on GWilley's code

SELECT DISTINCT SerialNo, Max (DISTINCT DatePurchased)
From tbl??
Group By SerialNo

I ran this on something similar on my db (table with over 700,000 records) and it seemed to work correctly.|||you do not need DISTINCT when you use GROUP BY -- groups are distinct by definition

save the serial numbers and max dates in a working table, then delete all the rows that don't have a match in the working table

select serial, max(date_purchased) as maxdate
into keepthese
from yourtable
group by serial

delete from yourtable
where not exists
( select 1
from keepthese
where serialno = yourtable.serial
and maxdate = yourtable.date_purchased )

rudy
http://r937.com/

No comments:

Post a Comment