|
|
[ ABOUT THE AUTHOR ]
I have been working on a SQL Server Integration Project for the last few months, and it made me perform a whole lot of research and also gave me a lot of good experiences. I thought about writing this note based on some of the issues I faced and how I solved them. This article lists three issues and the solutions I applied to each of them. One thing I need to mention, is that the solutions described may not be the best solution to the issue. I am sure a lot of you guys have addressed these kinds of issues before, hence I look forward on seeing your feedback or alternative solutions. The issues listed are;
Issue #1:
Continue the execution flow after handling errors
Initially I faced this problem with "Foreach Loop
Container", but this is applicable for any situation. The "Foreach Loop
Container" traverses a folder and accesses the files available. Tasks inside the
loop validate the files with certain procedures and record them in the database,
and lot more. What I wanted to configure was, continue with all files even after
an error occurred for the particular file. After spending some time, I figured
it out that how to implement it. I found some interesting points too. Here is
the way I did it.
Let's try
to simulate the issue. Follow below steps;
-
Create a SQL Server Integration Services project and open
the default package. Add a "Foreach Loop Container" to the "Control Flow".
-
Set the Enumerator of the "Foreach
Loop Container" as Foreach File Enumerator. Set a
folder (in may case, it is "Folder I".) to access that has at least three files (again,
in my case, files are TextFile1.txt, TextFile2.txt and TextFile3.txt).
-
Select Fully qualified for the
Retrieve file name. Do not check the
Traverse sub folders.
-
Go to the Variable Mappings and
set the Index 0 to new string type variable. Name
the variable as FilePath.
-
Create a new string type variable and name it as
NewFilePath.
-
Add a "Script Task" within the "Foreach Loop Container"
and name it as Set the NewFilePath variable.
-
Open the "Script Task Editor" and go to "Script" section.
Add User::FilePath as read-only variable and
User::NewFilePath as read-write variable. Add below
script to the task. It sets the destination folder as "Folder II".
' Setting the destination folder to the file.
The folder "D:\TestFolder II" is
the destination folder.
Dts.Variables("User::NewFilePath").Value
= "D:\TestFolder II\"
& Dts.Variables("User::FilePath").Value.ToString().Substring(Dts.Variables("User::FilePath").Value.ToString().LastIndexOf("\")
+ 1)
-
Add a "File System Task" into the "Foreach Loop Container"
and connect it with above "Script Task". Name it as Copy
file to the destination. Set the
operation as Copy File. Set
SourceVariable as User::FilePath and
DestinationVariable as User::NewFilePath.
Make sure that property OverwriteDestination set
as False.
-
Add another "Script Task" into the loop and connect it
with "File System Task". Script it, just to show a message saying "File is
successfully transferred.". Now your package looks like this;

Done. If you execute the package now, it should copy all the
files available in the "Folder I" to "Folder II". Execute and make sure it runs
without any problems. Once copied, delete all files in the "Folder II" except
"TextFile2.txt". Execute the package again. Here is the result;

You get a success message for the TextFile1.txt and then it
stops because it cannot copy the TextFile2.txt. It exists in the "Folder II".
This is the kind of issue I faced. I wanted to continue the execution with rest
of files, in this case, it should copy the TextFile3.txt even after the error
occurred with the TextFile2.txt. This is how I started addressing it;
-
I added the another "Script Task" that handles the error
and connected it with "Copy file to the destination". The precedence
constraint set as "Failure". The newly added "Script Task"
displays the error message (Originally, it should be used for logging the
error.).
-
Run the package again. Make sure that the "Folder II"
contains only "TextFile2.txt".
-
You get the success message for the first file
(TextFile1.txt) and get the error message for the second file
(TextFile2.txt). But still it does not continue with the third file.

-
We should tell to the SSIS engine that "ignore the error
and continue". The only way to pass the message to the engine is, setting
the property MaximumErrorCount of "Copy file to
the destination" task. The value in this property says that "continue the
execution until the number of specified errors are reached", that has
default value of 1.
-
Let's set the MaximumErrorCount
of "Copy file to the destination" to 100 and see. Again delete all files in
the "Folder II" except "TextFile2.txt". Execute the package. Here is the
result;
Now it does not go the "Error message" task at the processing of
"TextFile2.txt". It evaluates the result of "Copy file to the destination"
as success and execute the "Success message" task, that is wrong. Then it
fails the "Foreach loop
container" without executing the "TextFile3.txt". What we want is, execute the "Error message" task at the error and continue the
"Foreach loop container" without stopping, in my case process the
"TextFile3.txt".
-
Set the property MaximumErrorCount
of "Copy file to the destination" back to 1. This fails the task at the
error. Set the MaximumErrorCount of "Foreach
Loop Container" to 100. This causes to continue the "Foreach loop container"
at the error inside.
-
Delete files except "TextFile2.txt" in the "Folder II" and
execute the package. Here is the result, it is as we expected;

This is how I addressed the issue. Now it handles the errors
inside and continues the execution. If you need to capture the error message
generated from the system, implement "Error message" task in the
OnError event
handler of the "Copy file to the destination" task. The error message can be taken from the System::ErrorDescription
variable.
Issue #2: Errors are not getting captured -
System::ErrorDescription
This is a funny mistake I had made. You may do the same (my
apologies, if I underestimate you :)), so make sure you do this workaround. I
had a "Execute SQL Task" that perform a DDL operation and errors of the task had
been handled inside the OnError event handler. Errors
are logged into the database, it sends error info by executing a SP via
another "Execute SQL Task" inside the event handler. The issue was,
not all errors are logged, of course, because of a mistake I had made. Let
me show it with a simple example.
-
Create a package and add an "Execute SQL Task". Run below
script in one of your test databases. It adds two tables and one stored
procedure.
CREATE TABLE Test (Id int
PRIMARY KEY, [Value] varchar(5))
GO
CREATE TABLE Errors (Error
varchar(1000))
GO
CREATE PROC AddErrors @ErrorDescription
varchar(1000)
AS
BEGIN
INSERT INTO Errors VALUES (@ErrorDescription)
END
-
Create a connection for the test database and add the
connection to the task. Set a simple "INSERT" statement to the "Execute SQL
Task".

-
Open OnError event handler of
the "Execute SQL Task" and add another "Execute SQL Task" to the event. Set
the same connection to the task. We will be calling the SP like below;

But, as we need to capture the original error message, we have to set the
SQLStatement with SQLStatementSource
property expression. You will be setting the expression like
below;
-
Done. Test the package by executing it. It should insert a
record without any problem.
-
Okay. Executing again causes to insert a duplicate record
into the table, that leads to an error. Since we have a task to log the
error, error should be logged in the "Errors" table. Execute again. it
throws an error. Has it been inserted to the "Errors" table. It has not. At
a glance I could not find the exact reason for this but finally I found that
all because of single quotes contain in the error message. So, I had to do
little formatting on it, ended up with below expression;

What I have done here is, replace single quote with two single quotes.
Problem solved.
How did you handle this?
Issue #3: Validating error on Data Flow Task
This is another mistake I had made. One of my packages
contained "Data Flow Task" and the connection had been used inside the "Data
Flow Task" were dynamically set at the control flow. The design of the package
was done in my machine and the connection had been made to the local
server, even though the connection string was assigned at the run-time. The
issue was related to the "validation" of the "Data Flow Task", when it runs in a
different machine, or in a machine that has no database similar to the
design-time set database. If the issue want to be simulated, we would be having a simple package
like this;

You can have anything inside the "Load Data" - "Data Flow
Task" that does the extracting data from a source and loading them into a
destination. In my case, I added an "OLE DB Source" to get some data from
SQL Server and pass them to the added "Recordset Destination". The connection "DatabaseConnection"
is the one used by the "OLE DB Source". At the design-time, the "DatabaseConnection"
is set to the local database called "TestDatabase". The "Script Task" dynamically adjust the
connection string of "DatabaseConnection" (Assume that connection string
that is written for "TestDatabase", comes
from either configuration file or from the registry). Once the package is
deployed, it is supposed to get the connection string for different database and
set it to the "DatabaseConnection". Okay. Let's test this;
-
Create a package like above and set the connection to
local database (TestDatabase).
-
Create a variable called "ConnectionString" and set a
configuration file to get a value for it. Make sure the the connection
string coming from the configuration is written for "TestDatabase".
-
Code the "Set the connection string" task to assign the
value from "ConnectionString" variable to the connection.
Dts.Connections("SourceConnection").ConnectionString
= Dts.Variables("User::ConnectionString").Value.ToString()
-
Run the package and see. It should work without any
problems.
-
Now change the configuration file. Set connection string
to different database (The structure of the database should be equal to the
TestDatabase) and rename the "TestDatabase" to "_TestDatabase". This make
sure that local database set at the design-time is not available.
-
Run the package outside the Business Intelligence
Development Studio. Now it should take the new connection string from the
configuration file. Does it run without any problem? Here is the result of
mine;

-
It does not execute properly. The error says that "TestDatabase"
is not available. It is supposed to get the new connection string from the
configuration file and use it. But seems it is still looking for the one we
used at the design-time. This has to be stopped, and of course, it can be
simply solved.
-
Open the package again through BIDS. Get the property of
the "Data Flow Task". Change the property DelayValidation
to True.
-
Run the package again with the configuration file. Now it
does not check the availability (or no validation against the data source
that is used at the design-time) of "TestDatabase". It runs without any
problems.
The reason for this issue is, the "Data Flow Task" is being
validated with its component at the run-time before executing. Since the correct
connection string is not assigned the time of validating, it uses the old one
(or the one that is set at the design-time). By setting the property
DelayValidation to True, we can
avoid these issues. This is not valid to tasks such as "Execute SQL Task". You
may set the same when you execute package (child package) inside another
package.
[ RATE this ARTICLE, and COMMENT on it ] [ ABOUT THE AUTHOR ]
|