Monday, March 26, 2012

Duplicate Record Problem

I am working on a web application that utilizes a sql server database. One of the tables is a large text file that is imported through a DTS package from a Unix server. For whatever reason, the Unix box dumps quite a few duplicate records in the nightly run and these are in turn pulled into the table. I need to get rid of these duplicates, but can't seem to get a workable solution. the query that is needed to get the records is:
SELECT tblAppointments.PatientID, tblPTDEMO2.MRNumber, tblAppointments.PatientFirstName, tblAppointments.PatientLastName,
tblAppointments.PatientDOB, tblAppointments.PatientSex, tblAppointments.NewPatient, tblAppointments.HomePhone,
tblAppointments.WorkPhone, tblAppointments.Insurance1, tblPTDEMO2.Ins1CertNmbr, tblAppointments.Insurance2,
tblPTDEMO2.Ins2CertNmbr, tblAppointments.Insurance3, tblPTDEMO2.Ins3CertNmbr, tblAppointments.ApptDate, tblAppointments.ApptTime
FROM tblAppointments CROSS JOIN
tblPTDEMO2
WHERE (tblAppointments.PatientID = tblPTDEMO2.MRNumber)
AND tblAppointments.Insurance1 = 'MED'
AND tblAppointments.ApptTypeID <> 'MTG'
AND tblAppointments.ApptTypeID <> 'PNV'
AND DateDiff("dd", ApptDate, GetDate()) = 0
Order By tblAppointments.ApptDate
My first thought was to try to get a Select DISTINCT to work, but couldn't figure out how to do this with the query. My next thought was to try to set up constraints on the table, but, since there are duplicates, the DTS package fails. I assume there is a way to set up the transformations in a way to get this to work, but I'm not enough of an expert with SQL Server to figure this out on my own. I guess the other way to do this is to write some small script or application to do this, but I suspect there must be an easier way for those who know what they are doing. Any help on this topic would be greatly appreciated. Thanks.In SQL Server duplicates are eliminated by using Unique constraint or Index on the index you can add IGNORE_DUP_KEY option and make it all inserts because that option will not affect update statements. And they are very slow but the UNION operator also eliminates duplicates by applying implict Distinct. Run a search for all of the above in SQl Server BOL(books online). Hope this helps.|||For an import approach, I'd suggest inserting the records into atemporary table first. Then select out the unique records and insertthem into your destination table.
For the above query, would something like this work?
SELECT DISTINCT
PatientID,
MRNumber,
PatientFirstName,
PatientLastName,
PatientDOB,
PatientSex,
NewPatient,
...etc, etc, etc...
FROM
(
SELECT tblAppointments.PatientID, tblPTDEMO2.MRNumber, tblAppointments.PatientFirstName, tblAppointments.PatientLastName,
tblAppointments.PatientDOB, tblAppointments.PatientSex, tblAppointments.NewPatient, tblAppointments.HomePhone,
tblAppointments.WorkPhone, tblAppointments.Insurance1, tblPTDEMO2.Ins1CertNmbr, tblAppointments.Insurance2,
tblPTDEMO2.Ins2CertNmbr,tblAppointments.Insurance3, tblPTDEMO2.Ins3CertNmbr,tblAppointments.ApptDate, tblAppointments.ApptTime
FROM tblAppointments CROSS JOIN
tblPTDEMO2
WHERE (tblAppointments.PatientID = tblPTDEMO2.MRNumber)
AND tblAppointments.Insurance1 = 'MED'
AND tblAppointments.ApptTypeID <> 'MTG'
AND tblAppointments.ApptTypeID <> 'PNV'
AND DateDiff("dd", ApptDate, GetDate()) = 0
Order By tblAppointments.ApptDate
) AS SQ|||

I have re-written this a bit to remove the CROSS JOIN syntax which I think is less efficient than the INNER JOIN that is implied in the WHERE clause:
SELECT A.PatientID, D.MRNumber, A.PatientFirstName, A.PatientLastName, A.PatientDOB, A.PatientSex,
A.NewPatient, A.HomePhone, A.WorkPhone, A.Insurance1, D.Ins1CertNmbr, A.Insurance2,
D.Ins2CertNmbr, A.Insurance3, D.Ins3CertNmbr, A.ApptDate, A.ApptTime
FROM tblAppointments A INNER JOIN tblPTDEMO2 D on A.PatientID = D.MRNumber
WHERE A.Insurance1 = 'MED'
AND A.ApptTypeID <> 'MTG'
AND A.ApptTypeID <> 'PNV'
AND DateDiff("dd", ApptDate, GetDate()) = 0
Order By A.ApptDate

Now, I assume that the duplicates are in thetblPTDEMO2table. Tell me more about these: are the IDs duped?

|||Thanks for your response. The query you wrote is conceptually what I am trying to get, but it throws a syntax error. I've played with a number of variations of this and I can't seem to get it to work.
The other option of using a temporary table during the DTS package is viable, but once again, I can't seem to get the query right. I think what I want is something like:
Insert Into tblAppointments AppointmentKey, PatientFirstName, PatientLastName, ...VALUES (Select DISTINCT AppointmentKey, PatientFirstName, PatientLastName, ...)
but once again, I can't seem to get the syntax correct. Any help would be greatly appreciated. Thanks.|||Thanks for your response. Actually, the tblAppointments table is the one with the duplicates, and yes, the IDs are duplicated as well.

No comments:

Post a Comment