|
The Visual Studio 2005 Team Edition for Database Professionals (a.k.a. Data Dude) was released somewhere around December 2006 helping many Database Administrators and Database Developers. The release was huge for us, because it addresses many issues faced by us. The team system concept was introduced by Microsoft, understanding that the software developer is not the only key person in the development life cycle and consolidating the entire team into one box, giving different tool sets for each role players. For example, while the developer uses the Visual Studio Team Edition for Software Developers, architect uses the Visual Studio Team Edition for Software Architects. At the beginning, there was no proper place for the DBA (and DBE) to sit with the team and work; he plays his role in an isolated environment with great difficulties. Some of the problems faced by both Database Administrators and Database Developers (or Engineers) were versioning the changes, making changes directly to production environment, inability to find the errors early before running changes in production environment and doing the unit tests. There were various tools for addressing these issues by supporting the DBA and DBE but there was no tool that addressed everything. Finally Microsoft decided to make him part of the team by providing a tool that contains everything DBA and DBE wanted, introduced Microsoft Visual Studio 2005 Team Edition for Database Professionals. This tool gives us many things, mainly giving us an environment where the “truth” of the schema resides, allowing us to manage and test the changes without harming the production environment. It supports build changes and deployment. There are some articles / blog posts describing the tool where you can get a clear picture of the tool. This article mainly focuses on one of the key features, Test Data Generator which is very important when the development life cycle comes into unit testing. This will focus on;
Setting up the database schema with project model
I am not going to explain each and every step except some basics since we shall be mainly considering test data generation. Okay let’s start it. Follow the steps given.
-
Open the SQL Server Business Intelligence Development Studio and select File -> New -> Project. Select “SQL Server 2005 Wizard” template under “Database Projects -> Microsoft SQL Server” project type. Name the project as AdventureWorksSchema and give a proper location to the project. You may select “Add to source control” checkbox.

Note: You will not see the “Database Projects” type if you have not installed Data Dude.
Click "OK" to start the wizard. The "New Database Project Wizard" opens. Click on "Next" to continue.
-
Now you have the "Project Properties" page. Mainly there are two ways of organizing the appearance of objects in the project; by object type or by the schema. Since we will be taking the AdventureWorks database as the source and its objects are organized with more than one schema, let's select "Organize my project by schema". Type "Sales" as in the input box, since we may be creating objects related to "Sales" schema. Leave others with defaults.

Click on "Next" to continue.
-
Next page is for settings for database options. Accept defaults and click "Next" to continue.
-
The "Import Database schema" page opens. Let's import "AdeventureWorks" database schema. Check the "Import existing schema" checkbox. Make a new connection to the "AdventureWorks" database and select it.

Click on "Next".
-
Now you have the "Configure Build and Deploy" page. This is the last page of the wizard. "Always re-create database" generates database create scripts for the build. Uncheck if it is checked. "Block incremental deployment if data loss might occur" stops build if it sees any data loss for the changes. It is advisable to have this checked. Check it, it can be changed after if need. You may be checking "Back up database before deployment" because it backs up the destination before applying any changes which is a good practice too.

Click on "Finish" to complete the wizard.
-
Examine the summary. Click "Finish" again.
Now we have the schema of the AdventureWorks database. Note that this is not connected to the source; means the database is offline. Look at the "Solution Explorer" and "Schema View". Both views can use for examining the objects.

The changes we are supposed to make should not be directly applied to the source, hence we need to have a testing database. Usually we import the schema from the production environment and do changes and testing in the development environment. Let us now configure the testing database.
Setting up the test database
Setting the test database (or target database in this case) can be done via project properties. Follow the steps below.
-
Right click on the project name in the solution explorer and get properties. This opens the project properties window.
-
Click on "Build" and open the property page where you can set the target database.
-
Click on the "Edit" button next to "Target connection" input box for opening the "Connection Properties". Fill the "Server name" input box with the instance name of the SQL Server in your development environment (or your local server). Enter "AdventureWorksSchema" in the "Select or enter a database name". Test the connection and click on "OK".
-
Change the "Default location for target database files" if you need to create the test database in different location. Your project property window may look like this;

-
Done. Close the property page. Right-click again on the project and select "Deploy". It will build the output file and deploy to the new database. Open the "Management Studio" and check the newly created database called "AdventureWorksSchema".
This article focuses on support of data generation by the Data Dude. Do we really need to have test data? The answer is "Yes", if you do not have any real data with you. How can you check whether the written stored procedure that processes email addresses, executes properly without email addresses in the database? How do you check whether your query return the total amount of all the orders for given year without records in the "OrderHeader" and "OrderDetails"? How do you check whether your trigger updates the summary table at insert? For all these you need data. Not just values, you need meaningful values. Data Dude generates data for you. It generates meaningful data for you. Let's see how we can instruct Data Dude for generating data for us.
Generating Test Data Plan
Project supports to have more than one data generation project plan. This is useful for checking different set of changes. You can define one data generation plan for testing sales related stored procedures and triggers. You can have another plan for see the changes made for product details. Here are steps for creating a plan;
-
Right-click the project in the solution explorer and click on "Add -> Data Generation Plan". Type "DataGenerationPlan_PersonDetails.dgen" in the "Name" input box and click on "Add".
-
This opens mainly three windows; Schema view that shows all tables, Column details view and Data Generation preview. If you cannot see all three, adjust you Visual Studio windows for seeing them as below.

-
First uncheck all the selected tables. Simply right-click on the table area and click on "Exclude all tables from Data Generation".
-
Assume that we need to test some changes we made to "Person" related procedures. So we need data on "Person" object. Scroll down the table list and check "Person.Contact" table.
-
Make sure that all columns in the "Column details" are selected. You will notice that the "ContactID" has not been selected (and you cannot check too) because it is an identity type column.
-
Let's generate data. Click on the menu Data->Data Generator->Generate Data. Select the target database as "AdventureWorksSchema" from the drop down.
![Select target database for data generation]](http://sqlserveruniverse.com/utility/images/article_images/ADMN01002_08.PNG)
-
Click on "OK". You get a message box saying "Do you want to clear the content of the selected table before inserting new records?". Click on "Yes" to continue.
-
It starts generating data. Did it complete without generating errors? No. You will get an error from the process. Check the "Error List" window. It says that error is related to a XML column, in our case it comes from "AdditionalContactInfo" column. Let's stop generating values for this column. Uncheck the column "AdditionalContactInfo" from "Column details" section. Go back to step 6 and start again.
-
No errors. Explore the AdventureWorksSchema.Person.Contact table and see. Table has been filled with data for testing.
-
Close the data generation plan and save the project.
Consistency of Test Data
Assume that you made lot of changes to the target database by testing all you have implemented. This changes records in the "Person.Contact" table. You corrected the errors you found on the implementation and you need to do testing again, you need to test with same data. Can we generate same data? Yes, Data Dude does generate same data for maintaining the consistency between test procedures. Let's re-generate data and see.
-
If you have not changes any data in the AdventureWorksSchema.Person.Contact table, change some. Remember the changes you made.
-
Follow the step 6 and 7 of the previous procedure again.
-
Examine test data. You will notice that it has generated same data as it generated first time. See the image below.

The first window shows the data generated by the first generation. Second window shows the changes made to the data in the table. The third one shows the regeneration. Note that it has generated same data for changed records.
Regular Expression Builder
Data Dude accepts patterns as regular expressions and generates data matching the pattern. This is really useful for generating meaningful data. See the test data we generated above. Note the "Title" column. Why don't we instruct Data Dude to add correct titles for contacts?
-
Open the "DataGenerationPlan_PersonDetails.dgen" plan if it is not open. Make sure that the table "Person.Contact" is checked and selected in the "Tables" window.
-
You should see all the columns related to the "Person.Contact" table in the "Column Details" window. Expand the "Generator" drop-down in the "Title" row. Select "Regular Expression".
-
Click on the ellipsis button in the last column. This opens "Regular Expression Builder for Data Generation" window.
-
Expand the drop-down under the "Regular expression name" and examine the available expression. If one of the given expressions satisfies your need, you can simply select it. Or else you can create your own expression and save it for future usage too. Let's create a new one.
-
Type "Mr\.|Mrs\.|Ms\.|Dr\." in the "Regular expression definition" input box. This expression says that use this pattern to generate values for title column.
-
See the "Result preview" section. Check whether the preview contains the data you wanted.
-
Type "Title" in the "Regular expression name" drop-down and click on "Save". This saves the expression with the name of "Title" allowing expression to be used with other columns.

-
Click "OK" to save the expression. Go to "Data Generation Preview" and see how it has filled the "Title" column.
Constraints
Data Dude considers constraints too. If the column is defined with a constraint, the regular expression for the constraint will be listed in the drop-down, allowing you to select it or creating your own one. For example, if you have a "CHECK Constraint" like below for column called "Name";
CHECK ([Name] LIKE 'A%')
It will be listed in the "Regular expression name" drop-down like this;

It identifies the constraint and adds the relevant regular expression for the list. It does NOT identify (or understand) all the constraints. If you try to generate data for "Production.Product" table that has a constraint called "CK_Product_Class" that has been added like;
CHECK ((upper([Class])='H' OR upper([Class])='M' OR upper([Class])='L' OR [Class] IS NULL))
is not going to be recognized by the generator and it will not produce a regular expression for it. This is because of the TSQL function "UPPER" and "IS NULL" clause. These items (seem) cannot be recognized by the "Regular Expression Builder", hence it does not provide the regular expression for the constraint.
Check constraints with integer type columns
It does recognize check constraints added to the integer type columns but it does not allow us to add regular expressions for them. And the good thing is, it generates data applying the constraint added. If you notice the constraint "CK_Contact_EmailPromotion" defined for the "EmailPromotion" column in the "Person.Contact" table, the constraint looks like;
CHECK (([EmailPromotion]>=(0) AND [EmailPromotion]<=(2)))
Note the "EmialPromotion" column in the preview. It has generated values between 0 and 2 that means it has considered the constraint added. Alter the constraint like below;
CHECK (([EmailPromotion]>=(3) AND [EmailPromotion]<=(5)))
Save the script and go back to the Data Generation project. It gives you a message box saying "Schema has been changed, Do you want to reload?". Click on "Yes" and let it load the schema again. Once loaded, go to the data preview and see how values are generated for the "EmailPromotion" column. Has it taken the new constraint? In my case, NO, it has not. I am not sure about the reason for this, but if you recreate the test data generation project, it will take the new one. I may (or may not) be able to find the reason for this.
Maximum length
This allows us to set the maximum length of the value to be generated. This is useful if you need to to have test data for specific size.
Seed
This is the key for generating same data over and over again. We used seed "5" for generating data for "Title". This makes sure that it generates same values for same rows (in my case, it generated Mrs for 1st, 2nd and 3rd rows, Ms for 4th row and so on) until the seed get changed. If you need to change this generation pattern, you can change the seed. Change the seed to "1" and see the preview. It gives different set of values. If you change it back to "5", it generates same data set as the first generation.
Percentage null
This setting allows us to set the percentage of records that contains null for the selected column. Default number of records to be generated per table is 50. If you set the properrty "Percentage null" for "50", it will approximately generate 25 records with null values on "Title" column.
Data Bound Generator
Data Bound Generator loads data from another data source and uses them for generating data. This becomes very handy when the regular expression does not fit with the requirement. Think about the test data generation for the "Person.CountryRegion" table. This table requires actual values for regions and countries. Though it is possible to write a regular expression just like the one we wrote for the column "Title", simple and best solution for filling the table is using the "Data Bound Generator". Let's instruct Data Dude to load data from "AdventureWorks.Person.CountryRegion" table for generating test data for "AdventureWorksSchema.Person.CountryRegion" table.
-
Open the "DataGenerationPlan_PersonDetails.dgen" plan if it is not open. Check and select "Person.CountryRegion" table.
-
Right-click on the selected row and click on "Column details" menu item. It opens "Column details" window.
-
Select "Data Bound Generator" as the generator for the "CountryRegionCode" column. Click on ellipsis button at the last column.
-
Select the data source as "AdventureWorks" database. If it is not in the drop-down, select "<Add new connection>" and create a connection for it.
-
Type "SELECT CountryRegionCode FROM Person.CountryRegion" statement in the text area. Here is the screen of mine.

-
Click on "OK" to save the code. Go to preview and see the data. You will see that the column CountryRegionCode is filled with the data from the "AdventureWorks.Person.CountryRegion" table.
Remember that "Data Bound Generator" executes the query and loads all records to the memory before using them for the data generation. It holds data in a dictionary and pick values randomly for rows. You may get memory related errors if the memory is not sufficient for holding data. Therefore you have to make sure that there is enough memory for holding data before using the generator "Data Bound Generator".
Follow the same procedure for generating test data for "Name" column too. Use "SELECT Name FROM Person.CountryRegion" SQL statement with "Data Bound Generator". Done! Go and see the preview of test data. You should see that both columns "CountryRegionCode" and "Name" are filled with meaningful values. Do you notice some mismatching with data? You may or may not. Because this generator randomly picks values from the dictionary and uses, it might make a record that has a "Name" that is incompatible with "Code".
Another issue could occur is "Violation of PRIMARY KEY constraint". Though there are unique 238 unique "CountryRegionCode"s, Generator may pick a value that is already used for one of the generated rows. If it happens, it violates the "PRIMARY KEY" constraint exists in the "Person.CountryRegion" and stops generating data.
These two problems could be avoided by using "Sequential Data Bound Generator".
Sequential Data Bound Generator
The Sequential Data Bound Generator does not load all records from the source and hold. It connects with the source when it needs to generate a value and disconnects. Because of this, it makes connection to the source as many times as it needs based on the number you have mentioned in the "Rows to insert" column. If you believe that it affect to the performance of the test data generation (and you really care about the performance), then the generator should be "Data Bound Generator", not "Sequential Data Bound Generator".
It differs from the "Data Bound Generator" with another factor that is, it accepts "ORDER BY" clause and uses it when loading data from the source. This helps us to avoid the last error we faced with "Data Bound Generator". Okay, let's see how we can fix the issue.
-
Open the "DataGenerationPlan_PersonDetails.dgen" plan if it is not open. Check and select "Person.CountryRegion" table.
-
If need, change the "Rows to insert". The default value is 50. Since the source database has 238 unique codes, it would be the maximum number for this table. If you set a number greater than 238, for example 239, the 239th value will be a duplicate of 1st value, hence it will throw the "PRIMARY KEY" constraint violation error.
-
Right-click on the selected row and click on "Column details" menu item. It opens "Column details" window.
-
Select "Sequential Data Bound Generator" as the generator for the "CountryRegionCode" column. Click on ellipsis button at the last column.
-
Select the data source as "AdventureWorks" database. Type the following on the text area;
"SELECT CountryRegionCode FROM Person.CountryRegion ORDER BY CountryRegionCode". Your screen should be like this now;

Note that this does not give us the facility to change "Percentage null" and the "Seed" like "Data Bound Generator".
-
Click on "OK" to save the setting. Go to "Preview" and see that how data has been generated.
-
Follow the same procedure for "Name" column. Use the below statement for it.
"SELECT Name FROM Person.CountryRegion ORDER BY CountryRegionCode"
-
Click on "OK" and open the "Preview". There will not be any mismatching data because we used the same column for sorting the source.
-
Click on "Generate Data" and generate data on "AdventureWorksSchema" database. Say "Yes" for the message box that asks about the data clearance in the target. Did it generate data and pass to the target without any issues? It should be.
Related Table and Ratio to Related Table
Data Dude Data Generator generates data, not only for the selected tables but related tables. The relations are automatically identified by looking at the foreign keys available in the selected table. For example, if you select the table "Production.ProductSubCategory", it automatically marks the table "Production.ProductCategory" for data generating. If you generate data for "Production.ProductSubCategory", first it generates data for "Production.ProductCategory" table and then uses the keys of it for generating data for "Production.ProductSubCategory" table. Generate data and see. It generates 50 records (Default values for "Rows to insert") for selected table and 50 for the marked table. The "ratio" used for the generation is 1:1 that can be changed as we want. Let's generate data with different ratio.
-
Open the "DataGenerationPlan_PersonDetails.dgen" plan if it is not open. Check the table "Production.ProductSubCategory". Note that Data Dude automatically marks the table "Production.ProductCategory" for data generation.
-
The default value for the "Related table" is "None". Expand the drop-down. The drop-down contains all table names related to the selected table. Select "Production.ProductCategory" from the drop-down.
-
The ration ("Ratio to Related table") is automatically filled as "1:1". Change it to "5:1" means, generates five sub categories for one category. Note that Data Dude automatically changes the "Rows to insert" as "250".
-
Done! Click on "Generate Data" icon. Select target database. Click "Yes" to message box "Do you want to clear the content".
-
Query the target database and see. You will notice that there are 50 records sub categories and 250 records sub categories.
References
I believe that I have covered everything related to data generation with Data Dude. If you have done different types of data generations or you have come across any issues, you can share them with the community by adding as "comments".
[ RATE this ARTICLE, and COMMENT on it ]
|