Thursday, March 29, 2012

Duplicates

Hi!
Grateful for some help as a newbie...
I have a
OLE db SQL command: SELECT DISTINCT convert (char(8),date,112) as day,...etc

Resulting in error "Violation of PRIMARY KEY constraint 'PK_Dim_Date"... so Column Day in Dim_date contains duplicates.

I suppose i need a delete in Lookup or how would I eliminate duplicates in Dim?
DELETE day from dim_date where day in(Select day from date ...

This sounds like a transact-sql question, not an SSIS question.

But, what is the primary key of the table Dim_date?|||

Your post is not clear enough...

How you are getting a violation PK error when running a Select?

Please provide more details

|||Ok, let me clarify..
I have a sql command in OLE db Source:
SELECT DISTINCT convert (char(8),date,112) as day, etc etc (works fine)

But Execution data flow Date returns: [OLE DB Destination [2466]] Error: An OLE DB error has occurred. Error code: 0x80040E2F. Description: "Violation of PRIMARY KEY constraint 'PK_Dim_Date'. Cannot insert duplicate key in object 'dbo.Dim_Date'.".

Want to solve this by Lookup SQL command, deleting the duplicates in destination. Just dont know how...
Delete day from Dim_Date where ...|||

curiousss wrote:

Ok, let me clarify..
I have a sql command in OLE db Source:
SELECT DISTINCT convert (char(8),date,112) as day, etc etc (works fine)

But Execution data flow Date returns: [OLE DB Destination [2466]] Error: An OLE DB error has occurred. Error code: 0x80040E2F. Description: "Violation of PRIMARY KEY constraint 'PK_Dim_Date'. Cannot insert duplicate key in object 'dbo.Dim_Date'.".

Want to solve this by Lookup SQL command, deleting the duplicates in destination. Just dont know how...
Delete day from Dim_Date where ...

Fine. But we still don't know anything about the destination table. What is the primary key?|||

Ah, ok, the destination table primary key is Day (int).

|||

How is this Day columnn populated? Is it a surrogate key or is it a date format like 20070214?

You say you want to delete the duplicates in the destination. And then insert them again? Why not skip the records that are already there?

Pipo1

|||See the first thread on the main page of this forum for checking if a record exists, and if it does update it else insert.

Take that logic, and omit the update statement if you don't want to do anything when the record exists.|||Thank you so much!
How about if I delete and empty the whole table. This command does not seem to work though.

DELETE day

FROM dbo.Dim_Date ?|||

curiousss wrote:

Thank you so much!
How about if I delete and empty the whole table. This command does not seem to work though.

DELETE day

FROM dbo.Dim_Date ?

truncate table dbo.dim_date will delete the whole table
delete from dbo.dim_date will do the same thing, sort of

This whole scenario you have set up here isn't very clear. If day is the key to DIM_DATE, then there are no duplicates...|||Thank you...
Problems disappeared when I deleted the tables, and executed them again.

So now all are ok, my Fact_table is green as well! But execution of Fact returns error as below Dim_Salesperson, Dim_demografic. Should I care or let it be?

[Lookup Salesperson [7405]] Warning: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.|||

curiousss wrote:

Thank you...
Problems disappeared when I deleted the tables, and executed them again.

So now all are ok, my Fact_table is green as well! But execution of Fact returns error as below Dim_Salesperson, Dim_demografic. Should I care or let it be?

[Lookup Salesperson [7405]] Warning: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.

You might want to try a "select distinct column from table" in your lookup.|||Thanks, not eliminated with DISTINCT. Should I leave them or remove them?|||

curiousss wrote:

Thanks, not eliminated with DISTINCT. Should I leave them or remove them?

Only you can make that decision now. You know the data best, and we cannot help you further.

You might want to head over to the Transact-SQL forum here to get help in writing queries to suit your needs. Basically, you need to work on getting a distinct list returned in your lookup query. If you cannot do that due to data issues, then well, you'll need to determine how to work with that data. We cannot tell you which route to take.|||How would I actually remove /delete the duplicates manually(without T-sql). I see the duplicates in destination Preview. Like this?
"DELETE id, salary from dbo.Dim_D where id = 199sql

duplicates

How do I delete duplicate entries?
For example, lets says I have the following table format:
SystemName, Memory, CPU
I would like to delete any duplicates that are in SystemName.See if this helps:
http://support.microsoft.com/defaul...kb;en-us;139444
--
HTH,
Vyas, MVP (SQL Server)
SQL Server Articles and Code Samples @. http://vyaskn.tripod.com/
"Big D" <BigDaddy@.newsgroup.nospam> wrote in message
news:O8KImBgYFHA.2948@.TK2MSFTNGP10.phx.gbl...
How do I delete duplicate entries?
For example, lets says I have the following table format:
SystemName, Memory, CPU
I would like to delete any duplicates that are in SystemName.|||Please refer to www.aspfaq.com/5006 and provide sufficient information for
others to better understand you problem.
Deletes are applicable to a set of rows, not individual columns. If the
entire row is duplicated, then use the link posted by Vyas & make sure you
explicitly declare a primary key to avoid further duplication. If only the
sytemname is duplicated, they you will have to decide which values of memory
and/or cpu should be preserved and which ones should be deleted. Generally,
one can apply an extrema aggregate like MIN or MAX like:
SELECT *
FROM tbl t1
WHERE t1.Memory = ( SELECT MAX( t2.memory ) FROM tbl t2
WHERE t2.system = t1.system )
You can convert this into a delete like:
DELETE FROM tbl
WHERE Memory = ( SELECT MAX( t1.memory ) FROM tbl t1
WHERE tbl.system = t1.system )
If the duplicated values are determined by memory and/or cpu, you can use
WHERE Memory + '!' + cpu = ( SELECT MAX( t1.Memory + '!' + t1.cpu ) ...
Another way of writing it would be
DELETE FROM tbl
WHERE EXISTS ( SELECT * FROM tbl t1
WHERE t1.Memory > tbl.Memory
OR t1.cpu > tbl.cpu
OR ... )
Anith|||Of course, once you have duplicates purged, you should implement a unique
key constraint to prevent this from occurring again. But to answer your
question, let's say you have a table called Accounts and the column
combination tran_type should be unique. The following query will list all
records that violate the non-unique rule. We are basically joining back to a
group by sub-query having count(*) > 1. You can modify this as needed to fit
your situation. Once done, you can better profile the records to determine
which should be deleted. For example, do you want to keep the first entered
or last entered, or perhaps duplicates entered during a specific time frame
should be deleted.
select
Accounts.*
from
Accounts
join
(
select
account,
tran_type,
count(*) as cnt
from
Accounts
group by
account,
tran_type
having
count(*) > 1
) as x
on x.account= Accounts.account and
x.tran_type= Accounts.tran_type
order by
NSA.account,
NSA.tran_type
"Big D" <BigDaddy@.newsgroup.nospam> wrote in message
news:O8KImBgYFHA.2948@.TK2MSFTNGP10.phx.gbl...
> How do I delete duplicate entries?
> For example, lets says I have the following table format:
> SystemName, Memory, CPU
> I would like to delete any duplicates that are in SystemName.
>
>

duplicates

hi, here i am giving details about problem,please help me

table a
---
Std_id Std_name Grade Subject Score year
----------------
1 name1 grade3 mathematics 200 200102
2 name2 grade5 science 150 200001
1 name1 grade3 science 300 200102
3 name3 grade4 english 500 200203
4 name4 grade6 social studies 350 200304
5 name5 grade8 history 440 200405

table b
---
subject year
-------
mathematics 200304
science 200304
english 200304
social studies 200304
history 200304
mathematics 200405
science 200405
english 200405
social studies 200405
history 200405

query 1 is
select a.std_id,a.std_name,sum(a.score),b.-- from a,b
where a.subject=b.subject
group by Std_name,....

out put is
std_id std_name sum(score) .....
----------
1 name1 1000
2 name2 300
3 name3 1000
4 name4 700
5 name5 880

but actual output should be like
std_id std_name sum(score)
----------
1 name1 500
2 name2 150
3 name3 500
4 name4 350
5 name5 440

to get this output i have modified the table b and query then i got the right results

table b
---
mathematics 200001
science 200001
english 200001
social studies 200001
history 200001
mathematics 200102
science 200102
english 200102
social studies 200102
history 200102
mathematics 200203
science 200203
english 200203
social studies 200203
history 200203
mathematics 200304
science 200304
english 200304
social studies 200304
history 200304
mathematics 200405
science 200405
english 200405
social studies 200405
history 200405

query is

select a.std_id,a.std_name,sum(a.score),b.-- from a,b
where a.subject=b.subject and
a.year=b.year
group by std_name,....

now actual and expecting results are same
std_id std_name sum(score)
----------
1 name1 500
2 name2 150
3 name3 500
4 name4 350
5 name5 440

so my question comes here
is there any other way which i can get the same out put without adding rows(data) to table b,
please help me out of this problem.

thank you
pracalus27 views no suggestions.please somebody give me suggestion to resolve this problem

thank you
pracalus|||Try this

select a.std_id,a.std_name,sum(a.score),b.-- from a,b
where a.subject=b.subject and a.year=b.year
group by Std_name,....

Otherwise Post your query1 fully|||Hi Madhi,

thank you, you said use year=year then how it can pick the 20001,200102 and 200203 years data.it only fetch 200304 and 200405 years.
table a table b
200001
200102
200203
200304 200304
200405 200405

thank you
pracalus

Duplicates

Hi!
Grateful for some help as a newbie...
I have a
OLE db SQL command: SELECT DISTINCT convert (char(8),date,112) as day,...etc

Resulting in error "Violation of PRIMARY KEY constraint 'PK_Dim_Date"... so Column Day in Dim_date contains duplicates.

I suppose i need a delete in Lookup or how would I eliminate duplicates in Dim?
DELETE day from dim_date where day in(Select day from date ...

This sounds like a transact-sql question, not an SSIS question.

But, what is the primary key of the table Dim_date?|||

Your post is not clear enough...

How you are getting a violation PK error when running a Select?

Please provide more details

|||Ok, let me clarify..
I have a sql command in OLE db Source:
SELECT DISTINCT convert (char(8),date,112) as day, etc etc (works fine)

But Execution data flow Date returns: [OLE DB Destination [2466]] Error: An OLE DB error has occurred. Error code: 0x80040E2F. Description: "Violation of PRIMARY KEY constraint 'PK_Dim_Date'. Cannot insert duplicate key in object 'dbo.Dim_Date'.".

Want to solve this by Lookup SQL command, deleting the duplicates in destination. Just dont know how...
Delete day from Dim_Date where ...|||

curiousss wrote:

Ok, let me clarify..
I have a sql command in OLE db Source:
SELECT DISTINCT convert (char(8),date,112) as day, etc etc (works fine)

But Execution data flow Date returns: [OLE DB Destination [2466]] Error: An OLE DB error has occurred. Error code: 0x80040E2F. Description: "Violation of PRIMARY KEY constraint 'PK_Dim_Date'. Cannot insert duplicate key in object 'dbo.Dim_Date'.".

Want to solve this by Lookup SQL command, deleting the duplicates in destination. Just dont know how...
Delete day from Dim_Date where ...

Fine. But we still don't know anything about the destination table. What is the primary key?|||

Ah, ok, the destination table primary key is Day (int).

|||

How is this Day columnn populated? Is it a surrogate key or is it a date format like 20070214?

You say you want to delete the duplicates in the destination. And then insert them again? Why not skip the records that are already there?

Pipo1

|||See the first thread on the main page of this forum for checking if a record exists, and if it does update it else insert.

Take that logic, and omit the update statement if you don't want to do anything when the record exists.|||Thank you so much!
How about if I delete and empty the whole table. This command does not seem to work though.

DELETE day FROM dbo.Dim_Date ?|||

curiousss wrote:

Thank you so much!
How about if I delete and empty the whole table. This command does not seem to work though.

DELETE day FROM dbo.Dim_Date ?

truncate table dbo.dim_date will delete the whole table
delete from dbo.dim_date will do the same thing, sort of

This whole scenario you have set up here isn't very clear. If day is the key to DIM_DATE, then there are no duplicates...|||Thank you...
Problems disappeared when I deleted the tables, and executed them again.

So now all are ok, my Fact_table is green as well! But execution of Fact returns error as below Dim_Salesperson, Dim_demografic. Should I care or let it be?

[Lookup Salesperson [7405]] Warning: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.|||

curiousss wrote:

Thank you...
Problems disappeared when I deleted the tables, and executed them again.

So now all are ok, my Fact_table is green as well! But execution of Fact returns error as below Dim_Salesperson, Dim_demografic. Should I care or let it be?

[Lookup Salesperson [7405]] Warning: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.

You might want to try a "select distinct column from table" in your lookup.|||Thanks, not eliminated with DISTINCT. Should I leave them or remove them?|||

curiousss wrote:

Thanks, not eliminated with DISTINCT. Should I leave them or remove them?

Only you can make that decision now. You know the data best, and we cannot help you further.

You might want to head over to the Transact-SQL forum here to get help in writing queries to suit your needs. Basically, you need to work on getting a distinct list returned in your lookup query. If you cannot do that due to data issues, then well, you'll need to determine how to work with that data. We cannot tell you which route to take.|||How would I actually remove /delete the duplicates manually(without T-sql). I see the duplicates in destination Preview. Like this?
"DELETE id, salary from dbo.Dim_D where id = 199

Duplicates

Hi all,
This works fine to show duplicates of NumbB Column. I want to show the
duplicates and = different lastnames with those that
are duplicates.
Select id,NumbA,NumbB,Lastname,firstname,typepr
oc,dateofex from PC
where NumbB in(select NumbB from PC
group by NumbB
having count(NumbB)>1)order by NumbB
thanks
GVOne general approach is:
SELECT * -- use column names
FROM tbl t1
WHERE ( SELECT COUNT( * )
FROM tbl t2
WHERE t2.col <= t1.col ) >= 2 ;
As a side note, for such questions, always post table DDLs & sample data
along with expected results so that others can understand your requirements
better.
Anith|||> This works fine to show duplicates of NumbB Column. I want to show the
> duplicates and = different lastnames with those that
> are duplicates.
Can you rephrase what you want?
Please provide DDL and sample data.
http://www.aspfaq.com/etiquette.asp?id=5006
AMB
"gv" wrote:

> Hi all,
> This works fine to show duplicates of NumbB Column. I want to show the
> duplicates and = different lastnames with those that
> are duplicates.
> Select id,NumbA,NumbB,Lastname,firstname,typepr
oc,dateofex from PC
> where NumbB in(select NumbB from PC
> group by NumbB
> having count(NumbB)>1)order by NumbB
> thanks
> GV
>
>|||Hi again and thanks
Ok, thanks for your help
I want to pull duplicates of Col2, order them by Col2 and were
Col3(lastname) is the same within the duplicates.
SELECT Col1,Col2,Col3,Col4,Col5
FROM Table1
WHERE Col2 IN(SELECT Col2 FROM Table1
GROUP BY Col2
HAVING COUNT(Col2)>1)
ORDER BY Col2
Sample rows
Col1 Col2 Col3 Col4 Col5
532 000 doe J Intraductal EUS
8675 000 loe S Gastric
266 240 Smith D Pancreatic
266 240 Smith D Pancreatic
193 029 VuV N Esophagus
193 029 VuV N Esophagus
836 632 QQQ F Gastric
So would like the return to be:
Col1 Col2 Col3 Col4 Col5
266 240 Smith D Pancreatic
266 240 Smith D Pancreatic
193 029 VuV N Esophagus
193 029 VuV N Esophagus
thanks
GV
"gv" <viatorg@.musc.edu> wrote in message
news:eCNoS2kAFHA.3236@.TK2MSFTNGP15.phx.gbl...
> Hi all,
> This works fine to show duplicates of NumbB Column. I want to show the
> duplicates and = different lastnames with those that
> are duplicates.
> Select id,NumbA,NumbB,Lastname,firstname,typepr
oc,dateofex from PC
> where NumbB in(select NumbB from PC
> group by NumbB
> having count(NumbB)>1)order by NumbB
> thanks
> GV
>|||Try (not tested),
SELECT a.Col1,a.Col2,a.Col3,a.Col4,a.Col5
FROM
Table1 as a
inner join
(
SELECT Col2, col5
FROM Table1
GROUP BY Col2, col5
HAVING COUNT(*) > 1
) as b
on a.col2 = b.col2 and a.col5 = b.col5
order by
a.col2, a.col5
AMB
"gv" wrote:

> Hi again and thanks
>
> Ok, thanks for your help
> I want to pull duplicates of Col2, order them by Col2 and were
> Col3(lastname) is the same within the duplicates.
>
> SELECT Col1,Col2,Col3,Col4,Col5
> FROM Table1
> WHERE Col2 IN(SELECT Col2 FROM Table1
> GROUP BY Col2
> HAVING COUNT(Col2)>1)
> ORDER BY Col2
> Sample rows
> Col1 Col2 Col3 Col4 Col5
> 532 000 doe J Intraductal EUS
> 8675 000 loe S Gastric
> 266 240 Smith D Pancreatic
> 266 240 Smith D Pancreatic
> 193 029 VuV N Esophagus
> 193 029 VuV N Esophagus
> 836 632 QQQ F Gastric
> So would like the return to be:
> Col1 Col2 Col3 Col4 Col5
> 266 240 Smith D Pancreatic
> 266 240 Smith D Pancreatic
> 193 029 VuV N Esophagus
> 193 029 VuV N Esophagus
>
> thanks
> GV
>
>
>
>
>
> "gv" <viatorg@.musc.edu> wrote in message
> news:eCNoS2kAFHA.3236@.TK2MSFTNGP15.phx.gbl...
>
>

Duplicated Subreports

Hi all,

I can't seem to find the solution to this. I have created a report with 5 subreports (which are now linked). The problem is when I run the reports, the 5 reports when run (with a combined page total of 17 pages), now duplicate these records and produce 700 odd pages.

Can anyone help?

Thanks

MarcI would double check your links. Do you have a left outer join going on somewhere ?

duplicated rows

Hello,

I have a table T1 with fields ID, F1, F2, F3, F4, F5, F6….

I need to find if there is duplicated rows based on F1, F2, F3 columns. If there is set F5='minimum' where ID is MIN(ID). So the smallest should be set as minimum. How can I do this in a stored procedure?

Hi JIM.H:

FYI, here is a quick sampleBig Smile:

create table tbl_testDup (ID int primary key, F1 varchar(50), F2 varchar(50), F3 varchar(50),
F4 varchar(50), F5 varchar(50), F6 varchar(50))


create procedure sp_MarkMin
as
update tbl_testDup set F5='minimum'
where ID in
(select min(ID) from tbl_testDup group by F1,F2,F3)
go