Monday, March 26, 2012

duplicate number

I have a table with customers that is imported from Axapta 3.0. I have
imported them into a table that is a clone of Axapta table.
I see that i have rows where account number is the same. So it appears that
they have inserted same customer with same accountnumber several times.
So lets say the columns are:
accountnr (varchar), customername (varchar), address(varchar)
Typical data is then:
1000, companyA, streetA
1000, companyA, streetB
1001, companyB, streetC
1001, companyB, streetE
So how can i select only one of them so that i have
1000, companyA, streetA
1001, companyB, streetC
DISTINCT wont do it since street is all different. In real table there are
some 93 columns so i simplifyed a lil. It appears to me that in duplicate
company registrations there are only small changes.
It does not matter wich one that gets selected in the SQl expression i am
looking for.
I have an SQL 2000 std server.
best regards
TrondTry:
select
accountnr
, customername
, min (address)
from
Customers
group by
accountnr
, customername
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
"Trond Hoiberg" <trond@.montanis.com> wrote in message
news:e0ds7O4EGHA.1032@.TK2MSFTNGP11.phx.gbl...
I have a table with customers that is imported from Axapta 3.0. I have
imported them into a table that is a clone of Axapta table.
I see that i have rows where account number is the same. So it appears that
they have inserted same customer with same accountnumber several times.
So lets say the columns are:
accountnr (varchar), customername (varchar), address(varchar)
Typical data is then:
1000, companyA, streetA
1000, companyA, streetB
1001, companyB, streetC
1001, companyB, streetE
So how can i select only one of them so that i have
1000, companyA, streetA
1001, companyB, streetC
DISTINCT wont do it since street is all different. In real table there are
some 93 columns so i simplifyed a lil. It appears to me that in duplicate
company registrations there are only small changes.
It does not matter wich one that gets selected in the SQl expression i am
looking for.
I have an SQL 2000 std server.
best regards
Trond|||Well yes i could but i simplifyed the real case since there are so many
columns. In some cases also address is the same. and the only variation
could appear in any of the remaining 90 columns.
So i thin i have to solve it using the accountnr.
Best regards
Trond
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:us8cyV4EGHA.3348@.TK2MSFTNGP10.phx.gbl...
> Try:
> select
> accountnr
> , customername
> , min (address)
> from
> Customers
> group by
> accountnr
> , customername
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinpub.com
> "Trond Hoiberg" <trond@.montanis.com> wrote in message
> news:e0ds7O4EGHA.1032@.TK2MSFTNGP11.phx.gbl...
> I have a table with customers that is imported from Axapta 3.0. I have
> imported them into a table that is a clone of Axapta table.
> I see that i have rows where account number is the same. So it appears
> that
> they have inserted same customer with same accountnumber several times.
> So lets say the columns are:
> accountnr (varchar), customername (varchar), address(varchar)
> Typical data is then:
> 1000, companyA, streetA
> 1000, companyA, streetB
> 1001, companyB, streetC
> 1001, companyB, streetE
> So how can i select only one of them so that i have
> 1000, companyA, streetA
> 1001, companyB, streetC
> DISTINCT wont do it since street is all different. In real table there are
> some 93 columns so i simplifyed a lil. It appears to me that in duplicate
> company registrations there are only small changes.
> It does not matter wich one that gets selected in the SQl expression i am
> looking for.
> I have an SQL 2000 std server.
> best regards
> Trond
>
>
>|||Hi
If you had a method of uniquely identifying each row then you could do
something like say an id column
SELECT A.accountnr, A.customername, A.address
FROM accounts A
JOIN ( SELECT MIN(B.id), B.accountnr, B.customername
FROM Accounts B
GROUP BY B.accountnr, B.customername ) D ON A.accountnr =
D.accountnr AND A.customername = D.customername
Alternatively you can just use (say) MIN for each column that you are not
grouping by:
SELECT A.accountnr, A.customername, MIN(A.address)
FROM accounts A
GROUP BY A.accountnr, A.customername
John
"Trond Hoiberg" <trond@.montanis.com> wrote in message
news:e0ds7O4EGHA.1032@.TK2MSFTNGP11.phx.gbl...
>I have a table with customers that is imported from Axapta 3.0. I have
>imported them into a table that is a clone of Axapta table.
> I see that i have rows where account number is the same. So it appears
> that they have inserted same customer with same accountnumber several
> times.
> So lets say the columns are:
> accountnr (varchar), customername (varchar), address(varchar)
> Typical data is then:
> 1000, companyA, streetA
> 1000, companyA, streetB
> 1001, companyB, streetC
> 1001, companyB, streetE
> So how can i select only one of them so that i have
> 1000, companyA, streetA
> 1001, companyB, streetC
> DISTINCT wont do it since street is all different. In real table there are
> some 93 columns so i simplifyed a lil. It appears to me that in duplicate
> company registrations there are only small changes.
> It does not matter wich one that gets selected in the SQl expression i am
> looking for.
> I have an SQL 2000 std server.
> best regards
> Trond
>
>
>|||Well, you'll have to come up with some other business rules to break the
ties.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
"Trond Hoiberg" <trond@.montanis.com> wrote in message
news:OzoFUZ4EGHA.3100@.tk2msftngp13.phx.gbl...
Well yes i could but i simplifyed the real case since there are so many
columns. In some cases also address is the same. and the only variation
could appear in any of the remaining 90 columns.
So i thin i have to solve it using the accountnr.
Best regards
Trond
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:us8cyV4EGHA.3348@.TK2MSFTNGP10.phx.gbl...
> Try:
> select
> accountnr
> , customername
> , min (address)
> from
> Customers
> group by
> accountnr
> , customername
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinpub.com
> "Trond Hoiberg" <trond@.montanis.com> wrote in message
> news:e0ds7O4EGHA.1032@.TK2MSFTNGP11.phx.gbl...
> I have a table with customers that is imported from Axapta 3.0. I have
> imported them into a table that is a clone of Axapta table.
> I see that i have rows where account number is the same. So it appears
> that
> they have inserted same customer with same accountnumber several times.
> So lets say the columns are:
> accountnr (varchar), customername (varchar), address(varchar)
> Typical data is then:
> 1000, companyA, streetA
> 1000, companyA, streetB
> 1001, companyB, streetC
> 1001, companyB, streetE
> So how can i select only one of them so that i have
> 1000, companyA, streetA
> 1001, companyB, streetC
> DISTINCT wont do it since street is all different. In real table there are
> some 93 columns so i simplifyed a lil. It appears to me that in duplicate
> company registrations there are only small changes.
> It does not matter wich one that gets selected in the SQl expression i am
> looking for.
> I have an SQL 2000 std server.
> best regards
> Trond
>
>
>|||Hi
It does not matter if the values in the columns are the same as the minimum
of two equal values will be that value. As account number is always constant
(between rows you wish to differentiate) you will either have to use a
different column e.g. something like the id column in my previous post or
generate the means to differentiate the columns such as inserting all the
values into a temporary table that has an id column.
John
"Trond Hoiberg" <trond@.montanis.com> wrote in message
news:OzoFUZ4EGHA.3100@.tk2msftngp13.phx.gbl...
> Well yes i could but i simplifyed the real case since there are so many
> columns. In some cases also address is the same. and the only variation
> could appear in any of the remaining 90 columns.
> So i thin i have to solve it using the accountnr.
> Best regards
> Trond
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:us8cyV4EGHA.3348@.TK2MSFTNGP10.phx.gbl...
>|||Hi
You may want to look at Itzik's article on assigning Row numbers to
non-unique rows
http://www.windowsitpro.com/Article...5828/45828.html
John
"John Bell" <jbellnewsposts@.hotmail.com> wrote in message
news:OhAtGF5EGHA.3384@.TK2MSFTNGP12.phx.gbl...
> Hi
> It does not matter if the values in the columns are the same as the
> minimum of two equal values will be that value. As account number is
> always constant (between rows you wish to differentiate) you will either
> have to use a different column e.g. something like the id column in my
> previous post or generate the means to differentiate the columns such as
> inserting all the values into a temporary table that has an id column.
> John
> "Trond Hoiberg" <trond@.montanis.com> wrote in message
> news:OzoFUZ4EGHA.3100@.tk2msftngp13.phx.gbl...
>|||Trond Hoiberg (trond@.montanis.com) writes:
> I have a table with customers that is imported from Axapta 3.0. I have
> imported them into a table that is a clone of Axapta table.
> I see that i have rows where account number is the same. So it appears
> that they have inserted same customer with same accountnumber several
> times.
> So lets say the columns are:
> accountnr (varchar), customername (varchar), address(varchar)
> Typical data is then:
> 1000, companyA, streetA
> 1000, companyA, streetB
> 1001, companyB, streetC
> 1001, companyB, streetE
> So how can i select only one of them so that i have
> 1000, companyA, streetA
> 1001, companyB, streetC
> DISTINCT wont do it since street is all different. In real table there are
> some 93 columns so i simplifyed a lil. It appears to me that in duplicate
> company registrations there are only small changes.
> It does not matter wich one that gets selected in the SQl expression i am
> looking for.
Are you sure of that? What if some information is old and no longer
current and some is new?
From the technical presumptions you have given you can do this:
ALTER TABLE tbl ADD ident int IDENTITY
Then:
SELECT a.accountnr, a.customername, ...
FROM tbl a
JOIN (SELECT accountnr, ident = MAX(ident)
FROM tbl
GROUP BY accountnr) AS b ON a.accountnr = b.accountnr
AND a.ident = b.ident
But whether this really is right from a business perspective, I am not
so sure.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment