We have never built a data warehouse. We are not even sure what the required features are to do this. Nontheless, our developers are exploring two ETL tools: SSIS and Sunopsis.
They tend to like Sunopsis because it has been around a few years, has the equivalent of source-code modules (libraries) in the tool, has a highly granular security system, has 500 companies using it. and therefore must have survived several ETL projects. (We don't know if, or how many, real-world projects SSIS has been used for.). Sunopsis is Java based and uses its own authentication scheme (no support for Windows Active Directory yet. Bummer)
To the best of your knowledge is SSIS, out of the box, missing any of the standard functionalites needed to populate the datastore that will be used as the basis of our warehoue? (I don't even know precisely what all these functionalites are but my boss is worried we will find out somewhere down the road. While he recognizes that we can write our own code in .NET he prefers things "out of the box". They may not be fast but they tend to work right.)
TIA,
Barkingdog
I am biased, I like SSIS (most of the time) and have never used Sunopsis. Will it do what you want, I don't know, but since you admit you don't know what you want, why does that make Sunopsis a better choice?
I cannot compare the two for you, but I think SSIS allows you to do what you should need for any DW scenario, and yes custom code may play a part of that. It doesn't need to normally, but it often does. If you want a more feature rich system, less code, more mouse clicks, then perhaps Sunopsis is it, but I bet it comes at the price of flexibility and/or speed. Building tight or specific features that do not require or offer customisation, such as some code, naturally means you are tied into a specific pattern which may not meet your requirements.
|||I haven't used Sunopsis, so I also am commenting out of ignorance, but I can tell you one real problem with SSIS if you scale up past toy usage -- it doesn't really support reuse of components or logic(*). For example, if you want to use the same lookup followed by derived column transformation in 10 different places, you have to make 10 different copies, and make any changes to all 10 of them -- classic copy&paste "code reuse", which is about the worst kind of code reuse I think.
(*) You can reuse some logic by moving it into custom scripts, or into custom programmed components, but now you've certainly left the world of easy & graphical use, and gone back to the -- familiar :) -- world of real programming, and difficult debugging|||
>>> it doesn't really support reuse of components or logic(*).
Yes, that too was my concern! I didn't see any sense of code libraries or reusability (short of custom componenets) but I am too much an amateur in this area to know if it was simply my ignorance or SSIS reality.
Three tail wags and a hardy bark to you "tryanothername"!
|||Also, I have included a link to Sunopsis only to show its extensive customer base. While the young SSIS can not beign to match this base numerically, does anyone know of a web "page" listing companies that have used SSIS for large projects?
http://sunopsis.com/corporate/us/company/references/
TIA,
Barkingdog
|||I saw a demo of Sunopsis recently. The question isn't really one of features, its on of approach.
Sunopsis preach the gospel of ELT which means load the data into a staging table by hook or by crook (and that is what their libraries do) and then use the power of the database engine to do all your transformation.
SSIS certainly does not preclude the use of that approach but is more usually categorised as an ETL tool which means you can do the transformation prior to loading it into a database.
If you want a tool that can transform the data prior to insertion then you have no option - go with SSIS. If you are happy with the staging approach then you have a choice - SSIS or Sunopsis. I won't talk about which choice you should make here because I am obviously biased!
-Jamie
|||
barkingdog wrote:
>>> it doesn't really support reuse of components or logic(*).
Yes, that too was my concern! I didn't see any sense of code libraries or reusability (short of custom componenets) but I am too much an amateur in this area to know if it was simply my ignorance or SSIS reality.
Three tail wags and a hardy bark to you "tryanothername"!
Quite true. But remember that all Sunopsis' libraries do as far as I'm aware is get the data into a databse and as I'm sure you know SSIS has ample stuff available to do that. I would guess that SSIS is far far more feature rich than Sunopsis because Sunopsis basically relies on code to do the work.
With Sunopsis, once the data is in the database you have to write code (i.e. T-SQL) to do anything you want to do although I believe it does code generation for you. Just don't be under the impression that using Sunopsis means you'll have to write less code because if I understand it correctly the opposite is true.
-Jamie
|||
I use SSIS to do ELT because ETL is a real dog in terms of performance. I just finished re-engineering a Data import package, took out most of the scripting, moved some of the functionality to c# for better programming control and brought it down to 2.5 minutes from 35 minutes. Whatever tool you use, if the job is time critical, then you have to write some code and use ELT. I use SSIS to load the data into temp tables and use the powerful SQL engine to process the data. Lookups, deletes / updates / conversions are a breeze in SQL. Of course, you have to deal with some temp tables, but you can place them in a scratch database not to pollute your main database.
my 2 cents
-chiraj
|||I found a list of some SSIS case studies that are interesting reading.
http://www.microsoft.com/sql/evaluation/casestudies/integration.aspx?ddiDirectoryID=389
Barkingdog
|||chiraj wrote:
> I use SSIS to do ELT because ETL is a real dog in terms of performance.
My experience is certainly different from this, generalized into:
The SSIS data flow is generally faster or much faster than the database engine for:
Record based operations: manipulating a single record at a time such as string manipulation, Derived column component, Script component calling simple VBA.Net functions etc.
No comments:
Post a Comment