Thursday, March 22, 2012

dup check while import from excel

There have been some pretty advanced stuff here I've gone through, but I am
a
.NET programmer by trade, so T-SQL is a bit different for me.
I have an outside list of items with ID codes that I need to import to an
existing table, but I need to have the procedure check for dups in the excel
list BEFORE inserting them, if the ID already exists in the database, then
skip and move onto the next one.
(See the OOP concept here?)
I was thinking about something along the line of an If Exist followed by a
skip record or insert, but I think I am making this too complicated.
can someone give me a down-to-earth example of how to accomplish this?
Environment:
From: Excel File named InItem
ID field: ItemID
to:
SQL Database named dbo.tblInItem field ItemID
Thank you for any assistance...
-Tobias Mazzei
Wraith SystemsI prefer adding a linked server to the excel file (look up sp_addlinkedserve
r
in Books Online) to bring the two data sources to common ground.
Building an INSERT query is pretty simple from there:
insert <SQL_table>
(
<destination_column_list>
)
select <source_column_list>
from <Excel_table>
where (not exists (
..the excluding query is
dependent on the actual data
..basically you need to list
items that already exist in the destination table
))
For better help, please post DDL and sample data.
ML|||This is ussually how I check if a record already exsists, you can change the
creteria
IF EXISTS (SELECT 'True' FROM tb_Alerts WHERE ID = @.AlertID)
BEGIN
UPDATE tb_Alerts
SET AlertName = @.AlertName
WHERE ID = @.AlertID;
SELECT @.AlertID;
END
ELSE
BEGIN
INSERT INTO tb_Alerts (AlertName)
VALUES (@.AlertName)
SELECT @.@.IDENTITY;
END
-Mark
"Wraith Systems" <WraithSystems@.discussions.microsoft.com> wrote in message
news:AD088624-7F0F-4F88-BD8C-5C0AE09F754D@.microsoft.com...
> There have been some pretty advanced stuff here I've gone through, but I
> am a
> .NET programmer by trade, so T-SQL is a bit different for me.
> I have an outside list of items with ID codes that I need to import to an
> existing table, but I need to have the procedure check for dups in the
> excel
> list BEFORE inserting them, if the ID already exists in the database, then
> skip and move onto the next one.
> (See the OOP concept here?)
> I was thinking about something along the line of an If Exist followed by a
> skip record or insert, but I think I am making this too complicated.
> can someone give me a down-to-earth example of how to accomplish this?
> Environment:
> From: Excel File named InItem
> ID field: ItemID
> to:
> SQL Database named dbo.tblInItem field ItemID
> Thank you for any assistance...
> --
> -Tobias Mazzei
> Wraith Systems
>|||I am getting closer. Thanks for the response, I tried looking up "not exists
"
in the help files, and with the lack of definition, it brought me here.
where (not exist (exceltableID=SQLtableID))
This was my initial concept, but it didn't look like SQL understood what I
was trying to tell it, the words not and exists were greyed.
So the concept was to select the items where the SQLtable did not have (or
equal to) what was the current record the datareader was at, right?
is this accurate useage? and can this be done in a stored procedure?
-Tobias Mazzei
Wraith Systems
"ML" wrote:

> I prefer adding a linked server to the excel file (look up sp_addlinkedser
ver
> in Books Online) to bring the two data sources to common ground.
> Building an INSERT query is pretty simple from there:
> insert <SQL_table>
> (
> <destination_column_list>
> )
> select <source_column_list>
> from <Excel_table>
> where (not exists (
> ...the excluding query is
> dependent on the actual data
> ...basically you need to list
> items that already exist in the destination table
> ))
>
> For better help, please post DDL and sample data.
>
> ML|||On Mon, 7 Nov 2005 15:09:02 -0800, Wraith Systems wrote:

>I am getting closer. Thanks for the response, I tried looking up "not exist
s"
>in the help files, and with the lack of definition, it brought me here.
>where (not exist (exceltableID=SQLtableID))
>This was my initial concept, but it didn't look like SQL understood what I
>was trying to tell it, the words not and exists were greyed.
>So the concept was to select the items where the SQLtable did not have (or
>equal to) what was the current record the datareader was at, right?
>is this accurate useage? and can this be done in a stored procedure?
Hi Tobias,
No, this is not a correct query. The argument to an EXISTS clause should
be a subquery. And in this case, the subquery should be correlated to
the main query.
Try adapting the code below to your tables:
INSERT INTO DestTable (KeyColumn, DataColumn1, DataColumn2)
SELECT s.KeyColumn, s.DataColumn1, s.DataColumn2
FROM SourceTable AS s
WHERE NOT EXISTS (SELECT *
FROM DestTable AS d
WHERE d.KeyColumn = s.KeyColumn)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thank you Hugo. I think Microsoft finally answered the question as well, but
using a <not in> statement. This seems to work as well. If there are
performance factors, I have not found them yet, but I am only inserting
around 1,000 items at a time per batch.
-Tobias Mazzei
Wraith Systems
"Hugo Kornelis" wrote:

> On Mon, 7 Nov 2005 15:09:02 -0800, Wraith Systems wrote:
>
> Hi Tobias,
> No, this is not a correct query. The argument to an EXISTS clause should
> be a subquery. And in this case, the subquery should be correlated to
> the main query.
> Try adapting the code below to your tables:
> INSERT INTO DestTable (KeyColumn, DataColumn1, DataColumn2)
> SELECT s.KeyColumn, s.DataColumn1, s.DataColumn2
> FROM SourceTable AS s
> WHERE NOT EXISTS (SELECT *
> FROM DestTable AS d
> WHERE d.KeyColumn = s.KeyColumn)
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
>|||On Wed, 9 Nov 2005 08:46:33 -0800, Wraith Systems wrote:

>Thank you Hugo. I think Microsoft finally answered the question as well, bu
t
>using a <not in> statement. This seems to work as well. If there are
>performance factors, I have not found them yet, but I am only inserting
>around 1,000 items at a time per batch.
Hi Tobias,
There are two things to consider when you use NOT IN with a subquery:
1. If one of the rows in the subquery returns a NULL, the NOT IN will
never evaluate to True. You have to make sure that the subquery can't
return NULLs, else your query will fail.
If you use NOT EXISTS, then you don't have this problem.
2. The NOT IN can only be used if you have a sigle-key column. For a
multi-key column, NOT IN can't be used, since SQL Server doesn't
implement the ANSI standard row constructor.
With NOT EXISTS, the number of columns in the key doesn't matter.
For these reasons, I recommend to always rewrite NOT IN with a subquery
to NOT EXISTS with a subquery. Using NOT IN with a list of constants (as
in "WHERE Status NOT IN ('Sold out', 'Discontinued')") is okay, though.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Here is what I am using currently, but based on your responses, I anticipate
issues.
Insert Into <server>.tblInItemAddDescr
select ItemId,AddlDescr,ts
from xclItemAddDescr...[tblInItemAddDescr$]
where itemid not in(select itemid from <server>.tblInItemAddDescr)
go
I am not sure what multi-key vs. single-key would be considered, would you
be able to give a bit more detail on that?
-Tobias Mazzei
Wraith Systems
"Hugo Kornelis" wrote:

> On Wed, 9 Nov 2005 08:46:33 -0800, Wraith Systems wrote:
>
> Hi Tobias,
> There are two things to consider when you use NOT IN with a subquery:
> 1. If one of the rows in the subquery returns a NULL, the NOT IN will
> never evaluate to True. You have to make sure that the subquery can't
> return NULLs, else your query will fail.
> If you use NOT EXISTS, then you don't have this problem.
> 2. The NOT IN can only be used if you have a sigle-key column. For a
> multi-key column, NOT IN can't be used, since SQL Server doesn't
> implement the ANSI standard row constructor.
> With NOT EXISTS, the number of columns in the key doesn't matter.
> For these reasons, I recommend to always rewrite NOT IN with a subquery
> to NOT EXISTS with a subquery. Using NOT IN with a list of constants (as
> in "WHERE Status NOT IN ('Sold out', 'Discontinued')") is okay, though.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)
>|||On Wed, 9 Nov 2005 16:27:57 -0800, Wraith Systems wrote:

>Here is what I am using currently, but based on your responses, I anticipat
e
>issues.
>Insert Into <server>.tblInItemAddDescr
>select ItemId,AddlDescr,ts
>from xclItemAddDescr...[tblInItemAddDescr$]
>where itemid not in(select itemid from <server>.tblInItemAddDescr)
>go
Hi Tobias,
If the column itemid in <server>.tblInItemAddDescr can't be NULL, then
you won't have any issued with this. You couls still compare it to the
corresponding NOT EXISTS version to test for performance differences,
though.
OTOH, if the itemid column can hold NULLs, then you *will* have issues.

>I am not sure what multi-key vs. single-key would be considered, would you
>be able to give a bit more detail on that?
My bad. I messed up when typping. I meant multi-column vs single-column
keys.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment