I need help thinking about this problem :-)
I have an SSIS pkg that automatically downloads financial extracts from an ftp site. Once the files are downloaded, I load the extract to a table. The table is first deleted before the insert, so that each time the table has "fresh" data (whatever was in the extract for that day).
Once the extract data is in the table, I load the data into yet another table that combines data from many tables. Simple enough.
Problem is, sometimes when I download the extract, it hasn't been updated yet, so I'm downloading an OLD extract. This old data then gets loaded into the first table. That's ok, because it doesn't really hurt anything. I can always delete the table and reload it if necessary.
The problem occurs when the old data goes from this table into the OTHER table. We don't want old data in this other table!
I need a way to check that I'm not loading the same data 2 days in a row into the OTHER table.
I realize that I might be able to solve this problem without using SSIS, but the solutions I've come up with so far aren't 100% satisfactory. I can use a query to check dates and that sort of thing, but it isn't foolproof, and would create problems if I need to manually force the process though, that is, if I need to override the date logic.
Anyways, I'm wondering if there's an SSIS approach to this problem... I can't rely on timestamps on the data files either. They're not accurate.
This is has been very perplexing to me.
Thanks
You have to determine a way to programatically decide if the data is "new" or not. Without that, I don't know how we can solve anything for you.|||
Yes, I have that.
Once I load the table, I check a "date entered" field against the data's timestamp column.
That is, if I load the data on 8/15, the data timestamp should be 8/14, or the day before.
If it's 8/13, it's out of date.
But what happens if the data load on 8/14 (with a data timestamp of 8/13) didn't run... and I need to re-load the data from 8/14 on 8/15?
Now, date entered is 8/15, but the data has a data timestamp of 8/13, which in this case IS correct, but my logic will reject the data because it isn't from the day before.
That's the problem, as I see it. If I hardcode logic into my stored proc, then I can't easily override it if I need to.
I was thinking maybe, instead, I could put the date logic in a separate SQL function, then call this function from an Execute SQL task in SSIS. Then have the function return a 0 (failure) or 1 (success). If it returns a 0, then do not perform any more steps in the package. If it returns a 1, then finish running the steps in the package.
Is this possible to do? If so, that could solve the problem.
Thanks
|||My idea being, if I need to override the logic, I can simply disable the Execute SQL task that contains the function. Then run the package manually.|||You could store the last successful date loaded and then read that in at the beginning of your package and then compare it to your data coming in. If the data coming in is greater than that date, then process it.|||I would suggest the above mention of saving the successful date. Alternately, if you have a decent audit trail, you can use the date and time of the last run and do a comparison of the files timestamp
'
' Add your code here
'
Dim vars As Variables
Dim var As Variable
Dts.VariableDispenser.LockForRead("FileName")
Dts.VariableDispenser.GetVariables(vars)
Dim path As String = vars.Item(0).Value.ToString
If File.Exists(path) = False Then
' File does not exist, no attributes to read
Else
vars = Nothing
Dts.VariableDispenser.LockForWrite("FileModifiedDate")
Dts.VariableDispenser.LockForWrite("FileCreationDate")
Dts.VariableDispenser.GetVariables(vars)
vars("FileModifiedDate").Value = File.GetLastWriteTime(path)
vars("FileCreationDate").Value = File.GetCreationTime(path)
vars.Unlock()
End If
Dts.TaskResult = Dts.Results.Success
|||I'm not sure if I've made my problem very clear :-)
As I mentioned in a previous post, I already have a way to determine if data is out of date or not on any given day.
The problem is that if I want to be able to override this logic if I need to. Please see my previous post that clearly explains what the problem is.
Q.) I am wondering if I can call a function in an Execute SQL task to have it return a value to the package. Depending on the value returned, I want to halt execution of the package.
Thanks
|||Never mind... I realize my logic won't work anyways.
I just need to check the timestamps in the file names.........
No comments:
Post a Comment