Thursday, March 29, 2012

Duplicate Values

I have a table which is a license holder table (i.e., plumbers, electricians etc...) There are some people who appear in the table more than once as they have more than 1 type of license. I am tasked with querying out 200 of these people a week for mailing a recruitment letter which I am doing using the following select statement:

SELECT TOP 200 Technicians.Name, Technicians.Address, Technicians.City, Technicians.State, Technicians.ZipCode, Technicians.LicenseType
FROM Technicians

My problem is that this doesn't deal with the duplicates and distinct won't work because I need to pass the license type and that's the one field that's always distinct while the name and adress fields duplicate.You'll need to determine which LicenseType to return in cases where there is more than one. This example returns the lowest LicenseType alphabetically:

SELECT TOP 200
Technicians.Name,
Technicians.Address,
Technicians.City,
Technicians.State,
Technicians.ZipCode,
min(Technicians.LicenseType) LicenseType
FROM Technicians
group by Technicians.Name,
Technicians.Address,
Technicians.City,
Technicians.State,
Technicians.ZipCode|||Why don't you break the structure into PersonMaster, LicenseMaster, and PersonLicense? Then you can do an insert ... select distinct into those tables accordingly. The rest is common sense.|||Yes, if you can solve your problem by improving your database schema that is always preferable.|||Assume name is enough (You may have to do the whole row).

Notice the duplicity of Data...You really have 2 tables. That would make the SQL even easier...

There's no substitute for a good design..

USE Northwind
GO

CREATE TABLE myTable99(
[Name] varchar(50)
, Address varchar(255)
, City varchar(50)
, State char(2)
, ZipCode varchar(10)
, LicenseType varchar(10)
)
GO

INSERT INTO myTable99(
[Name]
, Address
, City
, State
, ZipCode
, LicenseType)
SELECT 'Brett','123 Main St','Newark','NJ','00000','ABC123' UNION ALL
SELECT 'Brett','123 Main St','Newark','NJ','00000','ABC456' UNION ALL
SELECT 'Brett','123 Main St','Newark','NJ','00000','ABC789' UNION ALL
SELECT 'Blinddude','123 Main St','OhiYo','OH','00000','XXX123' UNION ALL
SELECT 'Blinddude','123 Main St','OhiYo','OH','00000','XXX456' UNION ALL
SELECT 'rdjabarov','123 Main St','San Antonio','TX','00000','EFG123'
GO

SELECT TOP 200
[Name]
, Address
, City
, State
, ZipCode
, LicenseType
FROM myTable99 o
WHERE EXISTS (SELECT
[Name]
, Address
, City
, State
, ZipCode
FROM myTable99 i
WHERE i.[Name] = o.[Name]
GROUP BY
[Name]
, Address
, City
, State
, ZipCode
HAVING COUNT(*) > 1)
GO

DROP TABLE myTable99
GO|||Brett, i am disappointed i'm not in there as a plumber or something

by the way, your query only pulls out people who are multi-licensed

i guess that is one way to interpret "these people" in the original question

i personally would not have interpreted it as 200 of people with more than one license, but rather, 200 people overall, but no individual more than once

once again, good specs are shown to be crucial before we go merrily traipsing down the WHERE EXISTS path...

;) ;) ;)

in any case, would your WHERE clause not work better like this, assuming you were actually interested in pick only multi-license people...
WHERE 1 < ( SELECT count(*)
FROM myTable99 i
WHERE i.[Name] = o.[Name] )it's a correlated subquery after all, so it shouldn't need grouping

p.s. where's that thread where we were talking about the DBA getting the shaft for poor design? i have a link i want to add to it|||Rudy,

There are some people who appear in the table more than once as they have more than 1 type of license.

I just felt that THESE meant THOSE:D

And yes I was debating you're syntax...but I figured dup rows less the license meant the same guy...

Either way...it's a poor design, which I'm sure they're stuck with...

Maybe an updateable view would be a good thing here..|||Originally posted by Brett Kaiser
Maybe an updateable view would be a good thing here.. Nahhh, an updatable view would be a work-around. Fixing the underlying problems in the schema would be the good thing in this case!

-PatP|||Unfortunately, the database is provided directly by the State Board of Licensing and constantly updated so fixing the schema is not really an option. Great suggesions by the way. Because I didn't have a lot of time when I first posted I created a stored procedure that found duplicate license holders and marked all but instance as having already been mailed so therefore my original query which looks for licensees that have not already been mailed works without locating those duplicates. As the State updates the database I'll import only the new rows into the database (with the added [Mailed] bit field) and rerun the stored proc to mark duplicates as having already been mailed leaving me with a distinct record set. Thanks again for all your help!sql

No comments:

Post a Comment