Tuesday, March 27, 2012

Duplicate Records

Hi Folks,

I am new to SQl so need help in developing Stored procedure. What I am trying to do is, I have a big table of demographic information(more than 500000 records) for our customers. the table has lots of duplicates. It is because of typos. The unique key in the table is contactid. I have to write a code which reads through the records in the table and look for duplicates on different matchng criteria like firtsname, last name or phone no match etc. and assign a new id in a diffrent table with that contactid.
Like I have to open a recordset which reads through the first record and put that contactid and assign a new id(Let's call it personid) in another table(This table will have only two fields) then it will read the next record and see if we already have found this person if we have it will pick up that person id and will enter a record with that personid and its contactid and then it will go to the next record until done with all. So, by the end we will have one more table where we will have all the conatctids with their corresponding personid.

Plaese help me write some kind of store procedure in which I can do this.

Thank for the help.There are much faster and better ways to do this than by stepping through your records one at a time, but which is best depends on some specifics of your task.

One solution would be:

Select Max(PrimaryKey), Column1, Column2, Column3...
From YourOldTable
Into YourNewTable
Group By Colum1, Column2, Column3...

Other solutions involve select statements that join two instances of your table and match them on selected columns.

Is your goal to eliminate duplicates after updating related tables that have obsolete key values, or are you just creating a lookup table?

blindman|||I am not trying to eliminate duplicates. I want to create a lookup table. I want to run the several kinds of matches like first I want it on firstname, last name and zipcode, the next query will be on just the phone nos and next query should be on email. My loookup table will have only contactid and personid(one to many relationship) which means for several conatctids we have one personid. My ultimte gaol is to get the unique counts of contacts in the table.

Azra

No comments:

Post a Comment