Tuesday, March 27, 2012

Duplicate Removal and Intelligent T-SQL

I am looking for some T_SQL code that would help us eliminate
duplicate data entry into our mailing db. Finding and removing the
existing dups is not rocket science, but I am looking for a complete
solution.

We also want to be able to come up with a T-SQL script to be
intelligent enough to know if:

"Bill Jones" is the same as "Billy Jones" or

"Bill Jones" is the same as "William Jones" or

"Bill Jones at 123 My street" is the same as "Bill Jones at 123 My
St."

Purchasing scripts from a company is not out of the question. Its a
matter of script cost versus my hourly rate to create them... Needless
to say, something cheap.

We have only heard of one app that is dynamic enough to handle this,
and the cost was way, way out of our range, 6 figures is almost 2
years salary.

Any ideas would be greatly appreciated.funkdm1@.yahoo.com (Dave) wrote in message news:<f5174e0f.0406071223.73e8fc2d@.posting.google.com>...
> I am looking for some T_SQL code that would help us eliminate
> duplicate data entry into our mailing db. Finding and removing the
> existing dups is not rocket science, but I am looking for a complete
> solution.
> We also want to be able to come up with a T-SQL script to be
> intelligent enough to know if:
> "Bill Jones" is the same as "Billy Jones" or
> "Bill Jones" is the same as "William Jones" or
> "Bill Jones at 123 My street" is the same as "Bill Jones at 123 My
> St."
> Purchasing scripts from a company is not out of the question. Its a
> matter of script cost versus my hourly rate to create them... Needless
> to say, something cheap.
> We have only heard of one app that is dynamic enough to handle this,
> and the cost was way, way out of our range, 6 figures is almost 2
> years salary.
> Any ideas would be greatly appreciated.

This may help - I've used it successfully to do fuzzy matching of
duplicate company names:

http://www.winnetmag.com/SQLServer/...6094/26094.html

Simon|||Simon,

That was a very interesting article. It also came with about 1200
lines of t-SQL. I dont think thats exactly what I need, but I can
definetely use pieces of it.

Thanks

sql@.hayes.ch (Simon Hayes) wrote in message news:<60cd0137.0406072358.1cb5c387@.posting.google.com>...
> funkdm1@.yahoo.com (Dave) wrote in message news:<f5174e0f.0406071223.73e8fc2d@.posting.google.com>...
> > I am looking for some T_SQL code that would help us eliminate
> > duplicate data entry into our mailing db. Finding and removing the
> > existing dups is not rocket science, but I am looking for a complete
> > solution.
> > We also want to be able to come up with a T-SQL script to be
> > intelligent enough to know if:
> > "Bill Jones" is the same as "Billy Jones" or
> > "Bill Jones" is the same as "William Jones" or
> > "Bill Jones at 123 My street" is the same as "Bill Jones at 123 My
> > St."
> > Purchasing scripts from a company is not out of the question. Its a
> > matter of script cost versus my hourly rate to create them... Needless
> > to say, something cheap.
> > We have only heard of one app that is dynamic enough to handle this,
> > and the cost was way, way out of our range, 6 figures is almost 2
> > years salary.
> > Any ideas would be greatly appreciated.
> This may help - I've used it successfully to do fuzzy matching of
> duplicate company names:
> http://www.winnetmag.com/SQLServer/...6094/26094.html
> Simon|||>> I am looking for some T_SQL code that would help us eliminate
duplicate data entry into our mailing db .. not rocket science, but I
am looking for a complete solution. <<

Yes, it is :) Get a package and save yourself 5-10 man-years.
Call the NCSC at 800-238-3150 and request the following free
publications:

PUB 25 - Designing Business Letter Mail
PUB 28 - Postal Addressing Standards
PUB 221 - Addressing for Success

Buy a mailing list product to scrub your data. It is much harder than
it looks. Here is an old list of companies that sell ZIP code and
mailing related product:

BCC Software Inc.
39 Saginaw Drive #630
Rochester, NY 14623

Comco, Inc.
2211 Grant Street
Box #349
Bettendorf, IA 52722

CD Light
8861 South Silverstone Way
Sandy, UT 84093
CD-Rom of ZIP+4 for $30

Dassgar Systems
Box #915
Bennington, VT 05201-0915
Global-Z; international postal codes for lots of places

Eric Isaacson Software
416 East Bloomington
Bloomington, IN 47401-4739
demo version of ZipKey

Group 1 Software
4200 Parliament Place #600
Lanham, MD 20706

Melissa Data Corporation
32122 Paseo Adelanto
San Juan Capistrano, CA 92675

Search Software America
1445 East Putnam Ave
Old Greenwich, CT 06870

(ask for their booklet "The Math, Myth & Magic of Name Searching and
Matching". It is quite informative)

For 'UK postcode databases', the Royal Mail holds the copyright and
will licence you an 'outwards' database (i.e. the first segment of the
full postcode) for about 150 a year, or an 'inwards' database (i.e.
the full 9- or 10-character postcode) for about 500 UK pounds per
year.

KJL Software (www.kjlsoftware.com) gives you 5-digit ZIP Code, City,
State,USPS Status Code, LATEST Area Code(s) from NANPA, Time
Zone,Latitude, and Longitude for all valid US Postal Service 5 digit
ZIP Codes/City/State combinations and a Distance Calculator.|||I'd also recommend http://www.qas.com/

We use them for address checking/changing and there is additional
functionality (in the UK at least) to check against the electoral role
to verify if someone is at a specific address.

Failing that, UK wise, Experian or Equifax may be of some help. I'm
not sure what they do outside the UK though.

Hope that helps.

No comments:

Post a Comment