Tuesday, March 27, 2012

duplicate rows in 1 csv file

Hi, I am trying to import data from a csv files to a OLE DB Destination. The csv files contains all transactional changes . For example for a particular record the firstname, lastname, email address records change within the same csv file. I need to save only the last updated record from the csv files. I have tried "slowly changing dimensions" but these dont work when there is duplictes within the same csv file. Also have tried 'Sort' but this only stores the first occurance.
Any ideas how i can store the latest changed data within 1 csv file.

How does the process distinguish between the latest data and duplicate data?

Is there a date as part of the row or is it just the last row wins?

Kirk Haselden
Author "SQL Server Integration Services"

|||the last row wins. there is no timestamp field . This is the way designed by the third party developers. Records are saved in csv files as changes are made. I have to put all these rows of data into a datawarehouse without duplicating the primary key.

No comments:

Post a Comment