DTS to .net: records

Showing posts with label records. Show all posts

Thursday, March 29, 2012

Duplicated Records

invno frecref pono
---- ------ ----
INV0000006 INV000000601 X0000009
INV0000006 INV000000601 X0000010
INV0000006 INV000000601 X0000043
INV0000006 INV000000602 X0000009
INV0000006 INV000000602 X0000010
INV0000006 INV000000602 X0000043
INV0000006 INV000000603 X0000009
INV0000006 INV000000603 X0000010
INV0000006 INV000000603 X0000043
INV0000006 INV000000604 X0000009
INV0000006 INV000000604 X0000010
INV0000006 INV000000604 X0000043

Hi,

how do i elimate duplicated records? i tried suppress if duplicated but cant.I dont see any duplicated records in the data you posted... :confused:|||Hi wizards,

whats i mean were repeated data in column.. how do i suppress it?
just wanted to show only

INV0000006 INV000000601 X0000009
INV000000602 X0000010
INV000000603 X0000043
INV000000604|||Do a group on Invno and place the fields in group footer then it will show only the last record of each groupsql

Duplicate UID in PK column

Please excuse my noviceness, but how is it possible that I
can have duplicate records that are exactly the same in
every way when there is a PK field that is meant to be
unique?
Because this has happened, I've had to remove and re-index.
Can anyone please help this novice?That shouldn't happen, and I've never heard that it really happened. But it
is difficult to do anything or troubleshoot if you don't have a repro or
don't have the data anymore.
--
Tibor Karaszi, SQL Server MVP
Archive at:
http://groups.google.com/groups?oi=djq&as_ugroup=microsoft.public.sqlserver
"Benjamin" <ben.jones@.trendwest.com.au> wrote in message
news:2c7301c3a8ad$61c25480$3101280a@.phx.gbl...
> Please excuse my noviceness, but how is it possible that I
> can have duplicate records that are exactly the same in
> every way when there is a PK field that is meant to be
> unique?
> Because this has happened, I've had to remove and re-index.
> Can anyone please help this novice?

Tuesday, March 27, 2012

duplicate rows but no key on the tables

Dear All,

I have a table with 10 billion records but there are no key on it. I cannot
build a key on it as it is the data source.

However, the data source exits the duplicated rows.

I have used the DTS to transform the data into a new table and delete the
duplicated rows. As there are 10 billion records, i need to divide it into 3
parts and also the process lasts for 6 hours each part.

I want to ask is there any other good methods to slove my problem??

Thx

Estheresther s via SQLMonster.com (forum@.SQLMonster.com) writes:
> I have a table with 10 billion records but there are no key on it. I
> cannot build a key on it as it is the data source.
> However, the data source exits the duplicated rows.
> I have used the DTS to transform the data into a new table and delete
> the duplicated rows. As there are 10 billion records, i need to divide
> it into 3 parts and also the process lasts for 6 hours each part.
> I want to ask is there any other good methods to slove my problem??

Eliminating duplicates from 10 milliard(*) rows is nothing for the
impatient. I'm happy that I don't have to play that game.

(*) I assume. 10 billion rows as in what I mean with billion would be
really dauting...

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Duplicate Rows !

In my ad-hoc reports created by Report Builder, my data is grouped if there are more than one records corresponding to some value.Is there any way to override this behaviour?
In the report designer I can do this by using the 'Hide Duplicates' property of the textbox. But doing so will make the report unusable through Report Builder.
I would like to see all values even if they are repeating when they are exported to excel.

Any help is appreciated !

I'm assuming you have created a table with multiple groups.

Try creating a table that only has a single group, or make sure that the group with the duplicate date is the right-most group.

|||

Hi,

My report is an ad-hoc report and I have not defined any groups.
I need to avoid changing the column order in the report too. So is there any property in Report Builder tool corresponding to the 'Hide Duplicates' property that is available in the Report Designer ?

duplicate rows

Hi,

I have the following records in my sql table and want to place all the duplicates in one row. I tried to do this with an update query but had no success. I can do this in access but can't figure out how to do it in sql. Any help would be great. Thanks.

qrycurrentrecords LNAME MAJOR PAPATHANASIOU 135 DEPASSQUALLO 147 BILGER 215 KLER 267 MAKO 305 PERRY 379 MILLER 379 BILLS 379 WANDER 424 FLANAGAN 440 KAUFFMAN 440 KALLIS 492 SHARKY 670

mr4100 wrote:

... want to place all the duplicates in one row.

Can you explain a litlle more about what you mean?

|||

if you take a look at the picture, you can see there are 3 codes of 379. there are other columns attached to this table i just didn't show all of them. I want to loop through the table and place the 3 rows with the same code in a row by themselves. I'm sending an email from the table and don't want to send 3 seperate emails, just one email containing data about all 3 rows. My task sends an email for each row in the table. I hope I didn't confuse you.

thanks,

|||

It is still not clear what your desired output may look like. I suggest that if would be helpful if you posted the table DDL, some sample data in the form of INSERT statements, and what the desired output looks like. Also the version of SQL Server would be helpful in finding a solution.

I'm assuming you have explored the various ways to use GROUP BY...

|||

The confusing part is when you say: duplicates. There are not duplicates here, but rather it is just a typical situation with (hopefully) a parent table and key (for major) and then the child rows with different names. The best resource for how to do this is here:

http://databases.aspfaq.com/general/how-do-i-concatenate-strings-from-a-column-into-a-single-row.html

The site is sadly an eyesore these days, but the information is still good. Use the 2005 version for sure if you are using 2005, it is really excellent and works nice.

|||

what i want to do is take these 3 seperate rows that are in the table below with the same major and put them into 1 row by themselves whether it would be updating this table or a new table by themselves,

start with this:

name major desc

john 45 eng.

mary 45 eng.

corey 25 math

rose 45 eng.

sue 15 mus.

end with this:

name major desc

john,mary,rose 45 eng.

corey 25 math

sue 15 mus.

|||

Here it is.. if you use sql server 2005

Code Snippet

Create Table #data (

[name] Varchar(100) ,

[major] Varchar(100) ,

[desc] Varchar(100)

);

Insert Into #data Values('john','45','eng.');

Insert Into #data Values('mary','45','eng.');

Insert Into #data Values('corey','25','math');

Insert Into #data Values('rose','45','eng.');

Insert Into #data Values('sue','15','mus.');

Select Distinct

Substring((Select ',' + name [text()] from #data sub where sub.major=main.major and sub.[desc]=main.[desc] For Xml Path('')),2,8000)

,major

,[desc]

from

#data main

|||

this works perfectly! Could you explain why you use the #data sub and #data main and for xml Path(")),2,8000)?

I would just like to know what it is does to make this work, I would have never figured this out.

Thanks,

Duplicate row

hi all,
IF i had one table. in that table three records are there.all there have
same datas.there is no primary key.how i can delete the second row in that
table
regards
balaHi,
How can u say which is the second row?
U want to delete duplicate and want only one row?
Reply
--
Herbert
"balakarthik" wrote:

> hi all,
> IF i had one table. in that table three records are there.all there have
> same datas.there is no primary key.how i can delete the second row in that
> table
> regards
> bala|||hello there
try this
use northwind
select p.supplierid, p.unitprice,p.productid from products p
join
(
select supplierid, min(unitprice)as unitprice from products
group by supplierid
) as p2
on p.supplierid=p2.supplierid
and p.unitprice=p2.unitprice
hope it helps
thanks,
Jose de Jesus Jr. Mcp,Mcdba
Data Architect
Sykes Asia (Manila philippines)
MCP #2324787
"balakarthik" wrote:

> hi all,
> IF i had one table. in that table three records are there.all there have
> same datas.there is no primary key.how i can delete the second row in that
> table
> regards
> bala|||sorry wrong post
--
thanks,
Jose de Jesus Jr. Mcp,Mcdba
Data Architect
Sykes Asia (Manila philippines)
MCP #2324787
"balakarthik" wrote:

> hi all,
> IF i had one table. in that table three records are there.all there have
> same datas.there is no primary key.how i can delete the second row in that
> table
> regards
> bala|||bala
what do you mean by second row? rows not stored in an order in RDBMS.
CELKO is not going to leave you !!!!!
post ddl/sample data.
Regards
R.D
"Jose G. de Jesus Jr MCP, MCDBA" wrote:
> sorry wrong post
> --
> thanks,
> --
> Jose de Jesus Jr. Mcp,Mcdba
> Data Architect
> Sykes Asia (Manila philippines)
> MCP #2324787
>
> "balakarthik" wrote:
>|||You've haven't seen the rath of Hurricane Celko!
"balakarthik" <balakarthik@.discussions.microsoft.com> wrote in message
news:F742D9A8-CAC8-4501-B4C3-1374E68FCD83@.microsoft.com...
> hi all,
> IF i had one table. in that table three records are there.all there have
> same datas.there is no primary key.how i can delete the second row in that
> table
> regards
> bala|||the best way is to create the table again - correctly [with primary key]
- and insert the distinct values into it from the old table. The new
table will of course have to have a different name.
balakarthik wrote:

>hi all,
>IF i had one table. in that table three records are there.all there have
>same datas.there is no primary key.how i can delete the second row in that
>table
>regards
>bala
>|||>> You've haven't seen the wrath of Hurricane Celko! <<
LOL!!

Duplicate records on database

hi all,

How do i avoid duplicate records on my database? i have 4 textboxes that collect user information and this information is saved in the database. when a user fills the textboxes and clicks the submit button, i want to check through the database if the exact records exist in the database before the data is saved. if the user is registered on the database, he wont be allowed to login. how can i acheive this?

i thought of using the comparevalidator but i'm not sure how to proceed.

thanks

You can check for the existence of the record before you insert to avoid duplicates.

IF NOT EXISTS ( SELECT * FROM YourTable WHER <Condition>)

BEGIN

INSERT INTO ...

END

|||

1declare @.login_namevarchar(50)2declare @.messagevarchar(100)34set @.login_name ='CS4Ever'56IFEXISTS (SELECT 1FROM RegisterationTableWHERE LoginName = @.login_name)7BEING-- user already registered89set @.message ='Sorry ' + @.login_name +', you are already registered!'1011END-- user already registered12ELSE-- user not registered yet13BEGIN-- user not registered yet1415insert into RegisterationTable (LoginName)values (@.login_name)1617if(@.@.rowcount = 1)18begin-- user added sucessfully19set @.message = @.login_name +', you have been registered sucessfully.'20end-- user added sucessfully21else-- faild in adding user22set @.message ='Registereation filed', please contact registeration office!23begin-- faild in adding user24end-- faild in adding user2526END-- user not registered yet

Good luck.

sql

duplicate records linking tables

Want to do a MAS90 PR report linking Tables PR6, PR7, & PR8. The tables have multiple records for each employee. When I link PR6 & PR7 everything's fine. When I include PR8 then the records from PR7 print multiple times. What am I doing wrong? Thanks.Post the sample data and the result you want|||you need to be sure you understand the relationships the tables have to each othe.

In most cases, you'll need to do LEFT OUTER JOINS..and not INNER JOINS.

But again, this is based on the data set that you are trying to achieve.

post some sample data lines of what you are getting and also an example of what you would like to see ...then we can help.

Duplicate Records in a View

Dear Keith
I have had a look at the data and have come to the
conclusion that it is not duplicated in any way.
The reason is that if you check each row in the output
they are all in some way unique. For instance id 14 (which
as in twice) has a different TUT ID.
To give you some sort of solution then we need to know
what do you actually want as un duplicated data.
Thanks
J

>--Original Message--
>Hi There
>I am having a problem with some SQL and hope someone out
there can help. I
>am fairly new so apologise any bad coding etc.
>I have created a View, which works fine except I need to
eliminate some
>duplicate entries and I have tried using distinct but my
problem seems
>outside of it's capabilities.
>The SQL is here: http://www.step-
online.org.uk/sql/Courses_SQL.png
>The Output is here: http://www.step-
online.org.uk/sql/Courses_Output.png
>As you can see from this example, ID 1 and ID 14 are
duplicated because
>there are two tutors (TUT ID) for these courses. The
entire rows are not
>duplicated because of the differing TUT ID so distinct
will not work (unless
>I am using it wrong).
>I need to have the output as is now, but show only a
single instance of each
>ID regardless of how many tutors the courses may have.
>How can I achieve this?
>Thanks
>
>.
>Hi and Thanks for looking
I know that the TUT ID is different.
I want to ignore this though and exclude all the records which are duplicate
(even if they have different TUT IDs).
TUT ID is the tutor assigned to the course, so a record is showing up twice
if there are two tutors on the course. I only want to show the record once,
regardless of how many tutors (TUT IDs) there are for each course.
For my purposes, a duplicate record is one where there is another record
where everything is identical EXCEPT the TUT ID, because there can be many
tutors on the same course.
Thanks
"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:1160e01c4419c$2926d190$a101280a@.phx
.gbl...[vbcol=seagreen]
> Dear Keith
> I have had a look at the data and have come to the
> conclusion that it is not duplicated in any way.
> The reason is that if you check each row in the output
> they are all in some way unique. For instance id 14 (which
> as in twice) has a different TUT ID.
> To give you some sort of solution then we need to know
> what do you actually want as un duplicated data.
> Thanks
> J
>
> there can help. I
> eliminate some
> problem seems
> online.org.uk/sql/Courses_SQL.png
> online.org.uk/sql/Courses_Output.png
> duplicated because
> entire rows are not
> will not work (unless
> single instance of each|||Any suggestions would be great.
Thanks
"Keith" <@..> wrote in message
news:um6a%23eaQEHA.1620@.TK2MSFTNGP12.phx.gbl...
> Hi and Thanks for looking
> I know that the TUT ID is different.
> I want to ignore this though and exclude all the records which are
duplicate
> (even if they have different TUT IDs).
> TUT ID is the tutor assigned to the course, so a record is showing up
twice
> if there are two tutors on the course. I only want to show the record
once,
> regardless of how many tutors (TUT IDs) there are for each course.
> For my purposes, a duplicate record is one where there is another record
> where everything is identical EXCEPT the TUT ID, because there can be many
> tutors on the same course.
> Thanks
>
> "Julie" <anonymous@.discussions.microsoft.com> wrote in message
> news:1160e01c4419c$2926d190$a101280a@.phx
.gbl...
>

Duplicate Records in a View

Dear Keith
I have had a look at the data and have come to the
conclusion that it is not duplicated in any way.
The reason is that if you check each row in the output
they are all in some way unique. For instance id 14 (which
as in twice) has a different TUT ID.
To give you some sort of solution then we need to know
what do you actually want as un duplicated data.
Thanks
J

>--Original Message--
>Hi There
>I am having a problem with some SQL and hope someone out
there can help. I
>am fairly new so apologise any bad coding etc.
>I have created a View, which works fine except I need to
eliminate some
>duplicate entries and I have tried using distinct but my
problem seems
>outside of it's capabilities.
>The SQL is here: http://www.step-
online.org.uk/sql/Courses_SQL.png
>The Output is here: http://www.step-
online.org.uk/sql/Courses_Output.png
>As you can see from this example, ID 1 and ID 14 are
duplicated because
>there are two tutors (TUT ID) for these courses. The
entire rows are not
>duplicated because of the differing TUT ID so distinct
will not work (unless
>I am using it wrong).
>I need to have the output as is now, but show only a
single instance of each
>ID regardless of how many tutors the courses may have.
>How can I achieve this?
>Thanks
>
>.
>
Hi and Thanks for looking
I know that the TUT ID is different.
I want to ignore this though and exclude all the records which are duplicate
(even if they have different TUT IDs).
TUT ID is the tutor assigned to the course, so a record is showing up twice
if there are two tutors on the course. I only want to show the record once,
regardless of how many tutors (TUT IDs) there are for each course.
For my purposes, a duplicate record is one where there is another record
where everything is identical EXCEPT the TUT ID, because there can be many
tutors on the same course.
Thanks
"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:1160e01c4419c$2926d190$a101280a@.phx.gbl...[vbcol=seagreen]
> Dear Keith
> I have had a look at the data and have come to the
> conclusion that it is not duplicated in any way.
> The reason is that if you check each row in the output
> they are all in some way unique. For instance id 14 (which
> as in twice) has a different TUT ID.
> To give you some sort of solution then we need to know
> what do you actually want as un duplicated data.
> Thanks
> J
>
> there can help. I
> eliminate some
> problem seems
> online.org.uk/sql/Courses_SQL.png
> online.org.uk/sql/Courses_Output.png
> duplicated because
> entire rows are not
> will not work (unless
> single instance of each
|||Any suggestions would be great.
Thanks
"Keith" <@..> wrote in message
news:um6a%23eaQEHA.1620@.TK2MSFTNGP12.phx.gbl...
> Hi and Thanks for looking
> I know that the TUT ID is different.
> I want to ignore this though and exclude all the records which are
duplicate
> (even if they have different TUT IDs).
> TUT ID is the tutor assigned to the course, so a record is showing up
twice
> if there are two tutors on the course. I only want to show the record
once,
> regardless of how many tutors (TUT IDs) there are for each course.
> For my purposes, a duplicate record is one where there is another record
> where everything is identical EXCEPT the TUT ID, because there can be many
> tutors on the same course.
> Thanks
>
> "Julie" <anonymous@.discussions.microsoft.com> wrote in message
> news:1160e01c4419c$2926d190$a101280a@.phx.gbl...
>
sql

Duplicate Records in a View

Hi There
I am having a problem with some SQL and hope someone out there can help. I
am fairly new so apologise any bad coding etc.
I have created a View, which works fine except I need to eliminate some
duplicate entries and I have tried using distinct but my problem seems
outside of it's capabilities.
The SQL is here: http://www.step-online.org.uk/sql/Courses_SQL.png
The Output is here: http://www.step-online.org.uk/sql/Courses_Output.png
As you can see from this example, ID 1 and ID 14 are duplicated because
there are two tutors (TUT ID) for these courses. The entire rows are not
duplicated because of the differing TUT ID so distinct will not work (unless
I am using it wrong).
I need to have the output as is now, but show only a single instance of each
ID regardless of how many tutors the courses may have.
How can I achieve this?
ThanksKeith
CREATE TABLE #Demo (
idNo int identity(1,1),
colA int,
colB int
)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (2,4)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (4,2)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (5,1)
INSERT INTO #Demo(colA,colB) VALUES (8,1)
PRINT 'Table'
SELECT * FROM #Demo
PRINT 'Duplicates in Table'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo <> B.idNo
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Duplicates to Delete'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
DELETE FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Cleaned-up Table'
SELECT * FROM #Demo
DROP TABLE #Demo
"Keith" <@..> wrote in message news:OU9c0rWQEHA.3916@.TK2MSFTNGP12.phx.gbl...
> Hi There
> I am having a problem with some SQL and hope someone out there can help.
I
> am fairly new so apologise any bad coding etc.
> I have created a View, which works fine except I need to eliminate some
> duplicate entries and I have tried using distinct but my problem seems
> outside of it's capabilities.
> The SQL is here: http://www.step-online.org.uk/sql/Courses_SQL.png
> The Output is here: http://www.step-online.org.uk/sql/Courses_Output.png
> As you can see from this example, ID 1 and ID 14 are duplicated because
> there are two tutors (TUT ID) for these courses. The entire rows are not
> duplicated because of the differing TUT ID so distinct will not work
(unless
> I am using it wrong).
> I need to have the output as is now, but show only a single instance of
each
> ID regardless of how many tutors the courses may have.
> How can I achieve this?
> Thanks
>|||Thanks
I don't want to list the duplicates though - I want to list everything
EXCEPT the duplicates.
I couldn't figure out (being a newbie) how to do this.
Thanks
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:ON%23ucxWQEHA.236@.TK2MSFTNGP10.phx.gbl...
> Keith
> CREATE TABLE #Demo (
> idNo int identity(1,1),
> colA int,
> colB int
> )
> INSERT INTO #Demo(colA,colB) VALUES (1,6)
> INSERT INTO #Demo(colA,colB) VALUES (1,6)
> INSERT INTO #Demo(colA,colB) VALUES (2,4)
> INSERT INTO #Demo(colA,colB) VALUES (3,3)
> INSERT INTO #Demo(colA,colB) VALUES (4,2)
> INSERT INTO #Demo(colA,colB) VALUES (3,3)
> INSERT INTO #Demo(colA,colB) VALUES (5,1)
> INSERT INTO #Demo(colA,colB) VALUES (8,1)
> PRINT 'Table'
> SELECT * FROM #Demo
> PRINT 'Duplicates in Table'
> SELECT * FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo <> B.idNo
> AND A.colA = B.colA
> AND A.colB = B.colB)
> PRINT 'Duplicates to Delete'
> SELECT * FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo < B.idNo -- < this time, not <>
> AND A.colA = B.colA
> AND A.colB = B.colB)
> DELETE FROM #Demo
> WHERE idNo IN
> (SELECT B.idNo
> FROM #Demo A JOIN #Demo B
> ON A.idNo < B.idNo -- < this time, not <>
> AND A.colA = B.colA
> AND A.colB = B.colB)
> PRINT 'Cleaned-up Table'
> SELECT * FROM #Demo
> DROP TABLE #Demo
> "Keith" <@..> wrote in message
news:OU9c0rWQEHA.3916@.TK2MSFTNGP12.phx.gbl...
> > Hi There
> >
> > I am having a problem with some SQL and hope someone out there can help.
> I
> > am fairly new so apologise any bad coding etc.
> >
> > I have created a View, which works fine except I need to eliminate some
> > duplicate entries and I have tried using distinct but my problem seems
> > outside of it's capabilities.
> >
> > The SQL is here: http://www.step-online.org.uk/sql/Courses_SQL.png
> >
> > The Output is here: http://www.step-online.org.uk/sql/Courses_Output.png
> >
> > As you can see from this example, ID 1 and ID 14 are duplicated because
> > there are two tutors (TUT ID) for these courses. The entire rows are
not
> > duplicated because of the differing TUT ID so distinct will not work
> (unless
> > I am using it wrong).
> >
> > I need to have the output as is now, but show only a single instance of
> each
> > ID regardless of how many tutors the courses may have.
> >
> > How can I achieve this?
> >
> > Thanks
> >
> >
>|||Dear Keith
I have had a look at the data and have come to the
conclusion that it is not duplicated in any way.
The reason is that if you check each row in the output
they are all in some way unique. For instance id 14 (which
as in twice) has a different TUT ID.
To give you some sort of solution then we need to know
what do you actually want as un duplicated data.
Thanks
J
>--Original Message--
>Hi There
>I am having a problem with some SQL and hope someone out
there can help. I
>am fairly new so apologise any bad coding etc.
>I have created a View, which works fine except I need to
eliminate some
>duplicate entries and I have tried using distinct but my
problem seems
>outside of it's capabilities.
>The SQL is here: http://www.step-
online.org.uk/sql/Courses_SQL.png
>The Output is here: http://www.step-
online.org.uk/sql/Courses_Output.png
>As you can see from this example, ID 1 and ID 14 are
duplicated because
>there are two tutors (TUT ID) for these courses. The
entire rows are not
>duplicated because of the differing TUT ID so distinct
will not work (unless
>I am using it wrong).
>I need to have the output as is now, but show only a
single instance of each
>ID regardless of how many tutors the courses may have.
>How can I achieve this?
>Thanks
>
>.
>|||Hi and Thanks for looking
I know that the TUT ID is different.
I want to ignore this though and exclude all the records which are duplicate
(even if they have different TUT IDs).
TUT ID is the tutor assigned to the course, so a record is showing up twice
if there are two tutors on the course. I only want to show the record once,
regardless of how many tutors (TUT IDs) there are for each course.
For my purposes, a duplicate record is one where there is another record
where everything is identical EXCEPT the TUT ID, because there can be many
tutors on the same course.
Thanks
"Julie" <anonymous@.discussions.microsoft.com> wrote in message
news:1160e01c4419c$2926d190$a101280a@.phx.gbl...
> Dear Keith
> I have had a look at the data and have come to the
> conclusion that it is not duplicated in any way.
> The reason is that if you check each row in the output
> they are all in some way unique. For instance id 14 (which
> as in twice) has a different TUT ID.
> To give you some sort of solution then we need to know
> what do you actually want as un duplicated data.
> Thanks
> J
>
> >--Original Message--
> >Hi There
> >
> >I am having a problem with some SQL and hope someone out
> there can help. I
> >am fairly new so apologise any bad coding etc.
> >
> >I have created a View, which works fine except I need to
> eliminate some
> >duplicate entries and I have tried using distinct but my
> problem seems
> >outside of it's capabilities.
> >
> >The SQL is here: http://www.step-
> online.org.uk/sql/Courses_SQL.png
> >
> >The Output is here: http://www.step-
> online.org.uk/sql/Courses_Output.png
> >
> >As you can see from this example, ID 1 and ID 14 are
> duplicated because
> >there are two tutors (TUT ID) for these courses. The
> entire rows are not
> >duplicated because of the differing TUT ID so distinct
> will not work (unless
> >I am using it wrong).
> >
> >I need to have the output as is now, but show only a
> single instance of each
> >ID regardless of how many tutors the courses may have.
> >
> >How can I achieve this?
> >
> >Thanks
> >
> >
> >.
> >|||Any suggestions would be great.
Thanks
"Keith" <@..> wrote in message
news:um6a%23eaQEHA.1620@.TK2MSFTNGP12.phx.gbl...
> Hi and Thanks for looking
> I know that the TUT ID is different.
> I want to ignore this though and exclude all the records which are
duplicate
> (even if they have different TUT IDs).
> TUT ID is the tutor assigned to the course, so a record is showing up
twice
> if there are two tutors on the course. I only want to show the record
once,
> regardless of how many tutors (TUT IDs) there are for each course.
> For my purposes, a duplicate record is one where there is another record
> where everything is identical EXCEPT the TUT ID, because there can be many
> tutors on the same course.
> Thanks
>
> "Julie" <anonymous@.discussions.microsoft.com> wrote in message
> news:1160e01c4419c$2926d190$a101280a@.phx.gbl...
> > Dear Keith
> >
> > I have had a look at the data and have come to the
> > conclusion that it is not duplicated in any way.
> >
> > The reason is that if you check each row in the output
> > they are all in some way unique. For instance id 14 (which
> > as in twice) has a different TUT ID.
> >
> > To give you some sort of solution then we need to know
> > what do you actually want as un duplicated data.
> >
> > Thanks
> > J
> >
> >
> > >--Original Message--
> > >Hi There
> > >
> > >I am having a problem with some SQL and hope someone out
> > there can help. I
> > >am fairly new so apologise any bad coding etc.
> > >
> > >I have created a View, which works fine except I need to
> > eliminate some
> > >duplicate entries and I have tried using distinct but my
> > problem seems
> > >outside of it's capabilities.
> > >
> > >The SQL is here: http://www.step-
> > online.org.uk/sql/Courses_SQL.png
> > >
> > >The Output is here: http://www.step-
> > online.org.uk/sql/Courses_Output.png
> > >
> > >As you can see from this example, ID 1 and ID 14 are
> > duplicated because
> > >there are two tutors (TUT ID) for these courses. The
> > entire rows are not
> > >duplicated because of the differing TUT ID so distinct
> > will not work (unless
> > >I am using it wrong).
> > >
> > >I need to have the output as is now, but show only a
> > single instance of each
> > >ID regardless of how many tutors the courses may have.
> > >
> > >How can I achieve this?
> > >
> > >Thanks
> > >
> > >
> > >.
> > >
>

Duplicate Records in a table

How do i remove duplicate records from a table with a single query without using cursors or anything like that.

Sample :

tempCol

P.S The table has only one column

Usually way is: in your query, create a temp table to hold the distinct records from your physical table, then, delete the rows in the physical table, then, insert back from your temp table, finally, drop the temp table.

|||

che3358gives a good way to do it. The only thing I'd add is that you can make it all go faster by doing it only for the duplicate rows, ie, instead of doing it for distinct rows, do a count(*) and a having, ie,

select field1, field2, field3
into #DupTable
from mytable
group by field1, field2, field3
having count(*) > 1

delete mytable
from mytable a, #DupTable b
where a.field1 = a.field1
and b.field2 = b.field2
and ....

insert into mytable (field1, field2, ...)
select field1, field2, ...
from #DupTable

Duplicate records are being inserted with one insert command.

This is like the bug from hell. It is kind of hard to explain, so
please bear with me.

Background Info: SQL Server 7.0, on an NT box, Active Server pages
with Javascript, using ADO objects.

I'm inserting simple records into a table. But one insert command is
placing 2 or 3 records into the table. The 'extra' records, have the
same data as the previous insert incident, (except for the timestamp).

Here is an example. Follow the values of the 'Search String' field:

I inserted one record at a time, in the following order (And only one
insert per item):
airplane
jet
dog
cat
mouse
tiger

After this, I should have had 6 records in the table. But, I ended
up with 11!

Here is what was recorded in the database:

Vid DateTime Type ProductName SearchString NumResults
cgcgGeorgeWeb3 Fri Sep 26 09:48:26 PDT 2003 i null airplane 112
cgcgGeorgeWeb3 Fri Sep 26 09:49:37 PDT 2003 i null jet 52
cgcgGeorgeWeb3 Fri Sep 26 09:50:00 PDT 2003 i null dog 49
cgcgGeorgeWeb3 Fri Sep 26 09:50:00 PDT 2003 i null jet 52
cgcgGeorgeWeb3 Fri Sep 26 09:50:00 PDT 2003 i null jet 52
cgcgGeorgeWeb3 Fri Sep 26 09:50:22 PDT 2003 i null dog 49
cgcgGeorgeWeb3 Fri Sep 26 09:50:22 PDT 2003 i null cat 75
cgcgGeorgeWeb3 Fri Sep 26 09:52:53 PDT 2003 i null mouse 64
cgcgGeorgeWeb3 Fri Sep 26 09:53:06 PDT 2003 i null tiger 14
cgcgGeorgeWeb3 Fri Sep 26 09:53:06 PDT 2003 i null mouse 64
cgcgGeorgeWeb3 Fri Sep 26 09:53:06 PDT 2003 i null mouse 64

Look at the timestamps, and notice which ones are the same.

I did one insert for 'dog' , but notice how 2 'jet' records were
inserted
at the same time. Then, when I inserted the 'cat' record, another
'dog' record was inserted. I waited awhile, and inserted mouse, and
only the mouse was inserted. But soon after, I inserted 'tiger', and 2
more mouse records were inserted.

If I wait awhile between inserts, then no extra records are inserted.
( Notice 'airplane', and the first 'mouse' entries. ) But if I insert
records right after one another, then the second record insertion also
inserts a record with data from the 1st insertion.

Here is the complete function, in Javascript (The main code of
interest
may start at the Query = "INSERT ... statement):
----------------------
//Write SearchTrack Record -----------

Search.prototype.writeSearchTrackRec = function(){
Response.Write ("<br>Calling function writeSearchTrack \n"); // for
debug
var Query;
var vid;
var type = "i"; // Type is image
var Q = "', '";
var datetime = "GETDATE()";
//Get the Vid
// First - try to get from the outVid var of Cookieinc
try{
vid = outVid;
}catch(e){
vid = Request.Cookies("CGIVid"); // Gets cookie id value
vid = ""+vid;
if (vid == 'undefined' || vid == ""){
vid = "ImageSearchNoVid";
}
}

try{
Query = "INSERT SearchTrack (Vid, Type, SearchString, DateTime,
NumResults) ";
Query += "VALUES ('"+vid+Q+type+Q+this.searchString+"',
"+datetime+","+this.numResults+ ")";
this.cmd.CommandText = Query;
this.cmd.Execute();
}catch(e){
writeGenericErrLog("Insert SearchTrack failed", "Vid: "+vid+"
- SearchString:: "+this.searchString+" - NumResults: "+this.numResults
, e.description);

}
}//end

--------------------
I also wrote a non-object oriented function, and created the command
object inside the function. But I had the same results.

I know that the function is not getting called multiple times
because I print out a message each time it is called.

This really stumps me. I'll really appreciate any help you can
offer.

Thanks,

George"george" <georgem@.crystalgraphics.com> wrote in message
news:620c2f02.0309261014.20789ca0@.posting.google.c om...
> This is like the bug from hell. It is kind of hard to explain, so
> please bear with me.

<snip
I have no idea what's causing the issue, but the usual advice is to use
Profiler to trace the SQL which is getting sent to the database. You may see
something in the statements which helps you pin down what's going on.
Another thing to check is if there is an INSERT trigger on the table.

Simon|||"Simon Hayes" <sql@.hayes.ch> wrote in message news:<3f756b8c_4@.news.bluewin.ch>...
> "george" <georgem@.crystalgraphics.com> wrote in message
> news:620c2f02.0309261014.20789ca0@.posting.google.c om...
> > This is like the bug from hell. It is kind of hard to explain, so
> > please bear with me.
> <snip>
> I have no idea what's causing the issue, but the usual advice is to use
> Profiler to trace the SQL which is getting sent to the database. You may see
> something in the statements which helps you pin down what's going on.
> Another thing to check is if there is an INSERT trigger on the table.
> Simon

Thanks for your help. I found the cause. This is a web page that
contained a form. The form had a button with an onClick event handler
that evolked a script that submitted the form. Well, I added another
form to the page, and wanted the form to be submitted when the user
pushed the 'enter' key, so I focused the new submit button, and put an
onSubmit event handler in the new form. But the onSubmit event handler
evolked the before-mentioned script, so apparently two forms were
being submitted. One form had a hidden form element that contained
the previous 'search string' and the other form had an input box for
the current 'search string'. It explains why 2 db inserts happened
when I submitted the form, but it does not explain why sometimes 3
inserts happened. Anyway, it was kind of difficult to debug, because
it was not evident that 2 forms were being submitted, because, to the
user, the page worked just fine.

Well, perhaps this incident will help someone else down the road who
may run into this situation.

George

Duplicate records - SQL Below

Everytime, I run this query I get 3 or 4 duplicate records. I can't figure out what is going on. Any help would be appreciated. Thanks

Code: ( sql )

SELECT dbo.INVOICES.ORDER_NO AS "ORDER_NO", dbo.INVOICES.SALES_REP AS "SALES_REP", dbo.INVOICES.TERMS AS "TERMS", convert (varchar,cast (dbo.INVOICES.INV_AMOUNT AS money),1) AS "INV_AMOUNT", dbo.INVOICES.STATUS AS "STATUS", dbo.TRCK_GRP.NAME AS "GROUP NAME", dbo.TRCK_CHO.RANK AS "CHOICE RANK", dbo.TRCK_CHO.NAME AS "CHOICE NAME", dbo.TRCK_CHO.IS_DEFAULT AS "IS_DEFAULT", dbo.TRCK_GRP.RANK AS "GROUP RANK", convert(varchar,dbo.TRCK_SEL.MODIFIED_DATE,101) AS "MODIFIED_DATE", dbo.TRCK_SEL.SUB_CODE AS "SUB_CODE", dbo.TRCK_SEL.SUB_TYPE AS "SUB_TYPE", dbo.INVOICES.ORDER_DATE AS "ORDER_DATE", dbo.TRCK_SEL.GROUP_CODE AS "GROUP_CODE", dbo.TRCK_CHO.CHOI_CODE AS "CHOI_CODE", dbo.CUST.NAME AS "NAME", dbo.PERSONAL.PFIRST AS "PFIRST", dbo.PERSONAL.EMAIL AS "EMAIL" FROM ((((((dbo.TRCK_GRP INNER JOIN dbo.TRCK_CHO ON dbo.TRCK_GRP.GROUP_CODE = dbo.TRCK_CHO.GROUP_CODE) INNER JOIN dbo.TRCK_SEL ON dbo.TRCK_CHO.CHOI_CODE = dbo.TRCK_SEL.CHOI_CODE) INNER JOIN dbo.INVOICES ON dbo.TRCK_SEL.SUB_CODE = dbo.INVOICES.ORDER_NO) INNER JOIN dbo.CUST ON dbo.INVOICES.CUST_CODE = dbo.CUST.CUST_CODE) INNER JOIN dbo.ADDRESS ON dbo.CUST.CUST_CODE = dbo.ADDRESS.CUST_CODE) LEFT OUTER JOIN dbo.PERSONAL ON dbo.PERSONAL.IDNO = dbo.INVOICES.SALES_REP) WHERE dbo.INVOICES.STATUS = 8 AND dbo.TRCK_GRP.NAME LIKE 'CREDIT CARD AUTHORIZATION' AND dbo.TRCK_CHO.NAME IN ( 'AWAITING SIGNED CC AUTHORIZATION FORM' , 'CREDIT CARD DECLINED / EXPIRED' ) AND dbo.INVOICES.TERMS = 'CC' AND dbo.INVOICES.PAID = 'F' AND convert(varchar,dbo.TRCK_SEL.MODIFIED_DATE,101) = '{%Current Date MM/DD/YYYY%}' ORDER BY dbo.INVOICES.ORDER_NO ASCThere might be duplicat edata in the table itself.

Kindly post your table structure.

Duplicate records - no error message?

Hi,
I am in the process of moving my back end to the SQL server. I have a test
version in the SQLserver and the front end is in Access 2000 with linked
tables.
When the user enters a duplicate key (primary key) and the user tries to
save the record in Access (using a save button), an error message is
displayed by Jet.
The same kind of error message does not appear when the back end is in SQL
server. The duplicate record does not get stored as well.
Why is this happening?> The same kind of error message does not appear when the back end is in SQL
> server. The duplicate record does not get stored as well.
> Why is this happening?
Well, what does the statement look like? Did you use profiler to see what
is going into and coming out of SQL Server?|||Hi,
Thanks for responsing.
No...I am a newbie to SQL server and don't know how to do it.
Access (jet) gives an error message when I try to save a duplicate record. -
"The changes you requested to the table were not successful because they
would create duplicate values in the index, primary key or relationship".
This message appears when the f/e and the b/e are Access.
When the b/e is SQL for the same table and Access is the front end..no error
message appears.
Do you suggest I learn how to use a profiler and figure it out?
"Aaron Bertrand [SQL Server MVP]" wrote:

> Well, what does the statement look like? Did you use profiler to see what
> is going into and coming out of SQL Server?
>
>|||Hi
Since you have not provided your INSERT statement ,see Itzik Ben-Gan's
exanple to deal with dulicates
CREATE TABLE #Demo (
idNo int identity(1,1),
colA int,
colB int
)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (2,4)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (4,2)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (5,1)
INSERT INTO #Demo(colA,colB) VALUES (8,1)
PRINT 'Table'
SELECT * FROM #Demo
PRINT 'Duplicates in Table'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo <> B.idNo
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Duplicates to Delete'
SELECT * FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
DELETE FROM #Demo
WHERE idNo IN
(SELECT B.idNo
FROM #Demo A JOIN #Demo B
ON A.idNo < B.idNo -- < this time, not <>
AND A.colA = B.colA
AND A.colB = B.colB)
PRINT 'Cleaned-up Table'
SELECT * FROM #Demo
DROP TABLE #Demo
"Priya Henry" <PriyaHenry@.discussions.microsoft.com> wrote in message
news:3A756C75-F872-4168-A497-905031937F0B@.microsoft.com...
> Hi,
> Thanks for responsing.
> No...I am a newbie to SQL server and don't know how to do it.
> Access (jet) gives an error message when I try to save a duplicate
> record. -
> "The changes you requested to the table were not successful because they
> would create duplicate values in the index, primary key or relationship".
> This message appears when the f/e and the b/e are Access.
> When the b/e is SQL for the same table and Access is the front end..no
> error
> message appears.
> Do you suggest I learn how to use a profiler and figure it out?
> "Aaron Bertrand [SQL Server MVP]" wrote:
>|||Does your table in SQL Server have a unique index on the relevant column?
"Priya Henry" wrote:
> Hi,
> Thanks for responsing.
> No...I am a newbie to SQL server and don't know how to do it.
> Access (jet) gives an error message when I try to save a duplicate record.
-
> "The changes you requested to the table were not successful because they
> would create duplicate values in the index, primary key or relationship".
> This message appears when the f/e and the b/e are Access.
> When the b/e is SQL for the same table and Access is the front end..no err
or
> message appears.
> Do you suggest I learn how to use a profiler and figure it out?
> "Aaron Bertrand [SQL Server MVP]" wrote:
>|||Yes...My table does have an unique index. Is it possible that Jet is
stopping the message?
"NH" wrote:
> Does your table in SQL Server have a unique index on the relevant column?
> "Priya Henry" wrote:
>|||I havnt seen this before. Our system has a SQL Server backend and Access
front end and the error message do appear when trying to add in a duplicate
into a unique undexed column.
Have you checked the linked tables in Access, are they pointing to the
correct SQL Server database?
"Priya Henry" wrote:
> Yes...My table does have an unique index. Is it possible that Jet is
> stopping the message?
> "NH" wrote:
>sql

Duplicate records - difference method?

Hi,
My scenario is that i have 2 system with name and adress (100.000 names),
that have to be merged into 1 system without any duplicates.
The problem is that the spelling is not 100% between the system.
One way to find duplicate is to group name,adress and count > 1.
My dream is to use the sound index "Difference" so can i get around the
spelling problem.
DIFFERENCE
Returns the difference between the SOUNDEX values of two character
expressions as an integer.
Syntax
DIFFERENCE ( character_expression , character_expression )
Is that possible to use DIFFERENCE to find duplicates?
And how should the t-sql look like?
Example
name adress city
charles way1 state1
charle waj1 stat1
charlez vay1 stat1
I want to find this example, that this 3 is duplicates.
Should i use ordinary way with group and count >1, this would not be
duplicates.
Help
Thanx
TwSo you want these to be considered duplicates:
FirstName LastName
Jon Smyth
John Smith
Jonathan Smythe
John Smyth
'|||Hi,
Yes, i want these to be considered duplicates.
But i also want to validate adress and city to see if these is duplicates.
FirstName LastName Adress City
Jon Smyth Street 1 Palace1
John Smith Stret1 Palac1
Jonathan Smythe Stret 1 Palac 1
John Smyth Street1 Palace 1
Who can i do that in a t-sql?
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> skrev i
meddelandet news:uey1ZpUkFHA.2852@.TK2MSFTNGP15.phx.gbl...
> So you want these to be considered duplicates:
> FirstName LastName
> Jon Smyth
> John Smith
> Jonathan Smythe
> John Smyth
> '
>|||>> have 2 system with name and adress (100.000 names), that have to be mer
ged into 1 system without any duplicates. The problem is that the spelling i
s not 100% between the system. <<
Look up Melissa Data and get their software. Life is too short to
re-invent the wheel.|||But it cost a lot of money.
It should not be so difficult to do it in sql server.
I looking for a little help and after that i fix it, i hope :)
For the soundindex and difference function is in SQL Server.
Can i only run a t-sql and get out the difference integer, after that i fix
a algorithm.
But how should i run the t-sql to validate these values?
// tw
"--CELKO--" <jcelko212@.earthlink.net> skrev i meddelandet
news:1122318950.891968.136790@.g14g2000cwa.googlegroups.com...
> Look up Melissa Data and get their software. Life is too short to
> re-invent the wheel.
>|||>> But it cost a lot of money.<<
How much does doing it wrong cost you? How much is your time worth?|||> But how should i run the t-sql to validate these values?
T-SQL is supplying you the difference and soundex values, no?
So, you must do one of three things:
(a) determine beforehand what soundex/difference level means duplicate;
(b) inspect the results manually and make decisions; or,
(c) get software that does it right, and you might get some sleep at night.

Duplicate Records

Hi,

Not so sure how simple this question is but here is what happened. I installed SQL Server 2005 on a new Win Server 2003. I exported the tables and their data from the old machine to the newly established database on the new machine.

It looks like all my records were duplicated. When I try to delete one of the duplicates it won't work because both rows are effected. I can't set my primary key now and if I try to create a new database with the primary key already set than the import fails.

Any one run into this before or know what's going on?

Any help ASAP would really be appreciated.

Thanks,

Alice

I think I might have found the answer to my question here:

http://www.sqlteam.com/item.asp?ItemID=3331

I'll try it.

Alice

|||

I tried the SELECT DISTINCT approach and copied the records into a new table. It seemed to work, but everytime I try to create a new primary key it seems like the table creates a new duplicate record that prevents it. I'm wandering if this is something I'll need to call tech support on. Maybe something is misconfigured.

Ally

|||

Hi,

I would suggest to refer below links also which has workaround you are reffering

http://www.sql-server-performance.com/rd_delete_duplicates.asp

http://support.microsoft.com/kb/139444

Hemantgiri S. Goswami

Duplicate records

how to we check in for duplicate records without using sort (remove duplicateS)

i need to remove duplicates based on four columns.

please let me know

Are the duplicates coming from a SQL query?

The sort transformation is really the best option, in my opinion, unless you can issue a "SELECT DISTINCT" in your source pull.|||

source is a text file.Is it possible to use sort as i am deciding distinct based on four columns.

For each row coming i will be looking at four columns to decide if the records are distinct.

|||

sureshv wrote:

source is a text file.Is it possible to use sort as i am deciding distinct based on four columns.

For each row coming i will be looking at four columns to decide if the records are distinct.

Yes, pick your four columns and sort by them. Then click the remove duplicates. It will remove duplicates based on the chosen sort columns.

However, do you need to pick one row over another, or can you discard any row without knowing which row you are keeping?|||You can do this with a synchronous non-blocking script component pretty easily, especially if the data is already sorted by the rows you want to check.

The following code concatenates all your columns that you want to test for uniqueness into one string "key". If the key of the current row is different from the key of the previous row, then you redirect it to the output. Otherwise it gets dropped. This is so fast and simple that you should try to sort your data as it comes from the source in the order of these columns.

Code Snippet

Dim PreviousKey As String = ""

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim ThisKey As String = Row.FirstName + "_" + Row.LastName + "_" + Row.LastName + "_" + Row.EmailAddress
If ThisKey <> PreviousKey Then
Row.DirectRowToOutput0()
PreviousKey = ThisKey
End If
End Sub

If the data cannot be sorted, then instead of comparing ThisKey to PreviousKey, you would check for the existence of the current key in a hash table. If it doesn't exist, you redirect the row and add it. This will get slow if you have lots of unique values and you have to have enough memory to hold all that data.

|||Yet another option is to stage your data file into a sql server table and then select distinct from there.|||

JayH wrote:
You can do this with a synchronous non-blocking script component pretty easily, especially if the data is already sorted by the rows you want to check.

Nevermind. I was too slow with my post. If the source is a text file, then you should just use the sort component.
|||

If i use sort it does remove duplicates.can i stored the removed duplicates in a table .Is it possible?

|||

sureshv wrote:

If i use sort it does remove duplicates.can i stored the removed duplicates in a table .Is it possible?

The sort REMOVES duplicates. You can't redirect them. Instead, if you want to do what you've just stated, use the sort transformation but leave the remove duplicates box UNCHECKED. Then use JayH's code below to allow you to check for duplicates and then to redirect rows that are in "duplicate."|||

sureshv wrote:

If i use sort it does remove duplicates.can i stored the removed duplicates in a table .Is it possible?

Nop. That is the bad thing about the Sort transform technique. You don't have control over which rows get discarded. I always stage data in tables; then I can use sql to handle that. In SQl Server 2005 and oracle for example, I use the RANK() function to detect and enumerate the duplicates. Then in data flows I use a conditional split to let pass all myRank=1 and redirect the rest to an error table is that is required.

If you don't use staging tables; you still can acomplish it using a script component.

|||

I have one problem. I use sort component and it tries to get all data from the source. i connected my sort output to script as u said...But i dont find rows comming out of the sort output and stays yellow at sort.it dosent go to next step.What does it mean i unchecked remove duplicates..and i select 4 columns which i want and sort order for them is 1,2,3,4...is it wrong with text file or have i not set it right.

I really appreciate all help from u guys..Thank u very much...

|||

sureshv wrote:

I have one problem. I use sort component and it tries to get all data from the source. i connected my sort output to script as u said...But i dont find rows comming out of the sort output and stays yellow at sort.it dosent go to next step.What does it mean i unchecked remove duplicates..and i select 4 columns which i want and sort order for them is 1,2,3,4...is it wrong with text file or have i not set it right.

I really appreciate all help from u guys..Thank u very much...

The sort is a resource intensive transformation. It requires to 'get' all rows prior to sort/dedup them, so you won't see any row in the output untill all rows have been consumed by the sort . If the data set is big and/or memory available for SSIS is small, the package can perform poorly.

|||

the incoming data seem to have 2,190,234 rows.I know that sort will wait for all incoming rows and then sort it.So what do i need to do for its performance to increase...Is their any efficient method to run faster..

Please le me know

Thanks again

|||

sureshv wrote:

the incoming data seem to have 2,190,234 rows.I know that sort will wait for all incoming rows and then sort it.So what do i need to do for its performance to increase...Is their any efficient method to run faster..

Please le me know

Thanks again

Stage the data first in SQL server. It is another option that I gave you earlier today. Then SELECT from the source using an ORDER BY clause choosing your four columns. Then use Jay's approach to determine if the incoming record is in duplicate or not.|||

Phil Brammer wrote:

sureshv wrote:

the incoming data seem to have 2,190,234 rows.I know that sort will wait for all incoming rows and then sort it.So what do i need to do for its performance to increase...Is their any efficient method to run faster..

Please le me know

Thanks again

Stage the data first in SQL server. It is another option that I gave you earlier today. Then SELECT from the source using an ORDER BY clause choosing your four columns. Then use Jay's approach to determine if the incoming record is in duplicate or not.

If you are going to stage the data in SQL Server 2005 table; I would use the t-sql rank() function and a conditional split in the dataflow to deduplicate and redirect the non-desire rows; as I explained in my other post bellow.

Thursday, March 29, 2012

Duplicated Records

Duplicate UID in PK column

Tuesday, March 27, 2012

duplicate rows but no key on the tables

Duplicate Rows !

duplicate rows

Duplicate row

Duplicate records on database

duplicate records linking tables

Duplicate Records in a View

Duplicate Records in a View

Duplicate Records in a View

Duplicate Records in a View

Duplicate Records in a View

Duplicate Records in a table

Duplicate records are being inserted with one insert command.

Duplicate records - SQL Below

Duplicate records - no error message?

Duplicate records - difference method?

Duplicate Records

Duplicate records

DTS to .net

Blog Archive

About Me