Business Intelligence Seminar Series

Do more with what you already have. Leverage the products you already own and the data your systems produce to gain operational efficiencies and improve your company's business intelligence capabilities!

Baltimore
Tuesday, Nov. 3 (9:00 a.m. – 11:30 a.m. EST)
Hyatt Place
4730 Painters Mill Road
Owings Mills, MD 21117
Register

Reston, VA
Wednesday, Nov. 18 (9:00 a.m. – 11:30 a.m. EST)
Microsoft Corporation
12012 Sunset Hills Rd.
Reston, VA 20190
Register

Washington, D.C.
Tuesday, Dec. 1 (9:00 a.m. – 11:30 a.m. EST)
Microsoft Corporation
5404 Wisconsin Ave.
Chevy Chase, MD 20815
Register

Philadelphia
Wednesday, Dec. 2 (9:00 a.m. – 11:30 a.m. EST)
Philadelphia Marriott West
111 Crawford Avenue
West Conshohocken, PA 19428
Register

Hosting SQL Server on VMWare

One of our clients was investigating the logistics of running SQL Server on VMWare. In discussions with Microsoft, the response included four (4) important links. I personally have seen various SQL Server performance issues when virtualizing especially in a SharePoint environment and I would never trust that configuration. There have been many enhancements with SQL Server 2005 and now SQL Server 2008 in order to consolidate servers and take advantage of reducing hardware needs and license costs. A good SQL Server 2008 consolidation white paper can be found here.

Here is the info from the Microsoft technician:

If you are running ESX V3.5 the latest hotfixes from VMware to disable write-caching are included in update3 and update 4.


1. The Microsoft Virtual Server support policy


2. The VMware knowledge base article that you can contact VMware about for hosting SQL Server


3. This article that explains the issue and discusses the problem domain


4. Here is an presentation from VMWare about how to disable the disk write issue.

To MERGE or not to MERGE in SQL Server 2008

I was reviewing some demo material for the MERGE statement that Jim Pletscher put together. It was a standard scenario where we wanted to UPDATE, DELETE, or INSERT rows from a source table to a target table. A general overview of this is available here.

Inserting is harmless but what if sometimes we don't want to update or delete something from our target table?

My first step was to add a new column to the target table. I named this column "DoNotTouch" and made it a bit defaulting to zero (0). I then set this value to one (1) on one of the target table rows.

So now how do we tell the MERGE statement to not touch anything where this bit is flipped on? You cannot use a WHERE clause in the MERGE statement. The answer is within the WHEN clause of the MERGE statement. All you need to do is add an additional condition to each WHEN for the update and delete operations:

(TargetTable is the name of the target table and SourceTable is the name of the source table)

WHEN MATCHED AND TargetTable.DoNotTouch<> 1 THEN


               UPDATE SET TargetTable.FirstName = SourceTable.firstname,

                                        TargetTable.LastName = SourceTable.lastname

WHEN NOT MATCHED BY SOURCE AND TargetTable.DoNotTouch<> 1 THEN
DELETE


So that worked! It did not touch the row where I set the DoNotTouch flag to one (1).

Introduction to Change Data Capture

The Change Data Capture feature is a new feature in SQL Server 2008 that allows processes to
quickly and easily identify changes to a table in SQL Server 2008. It primarily works by watching the SQL server transaction logs, and running processes that pull changes from the logs and moving them into tables.

Change Data Capture Setup

Enabling Change Data Capture is a two step process. First, you must enable it at the database level, then you must enable it for each table for which changes must be recorded.

The first step in setting up Change Data Capture is to enable the option at the Database Level. The easiest way to do this is to use the system stored procedure sys.sp_cdc_enable_db.
Simply execute this stored procedure from the desired database, and it will enable change data capture for the database. This stored procedure requires sysadmin privilege in order to execute it.

(click to enlarge)














Once it is enabled at the database level, the Capture Change Data option must be enabled for each table for which changes should be captured. Use the sys.sp_cdc_enable_table stored procedure to enable the feature, passing the table name as an argument in order to enable the feature. By default, all of the columns in the source table are identified as captured columns. If only a subset of columns need to be tracked, such as for privacy or performance reasons, use the @captured_column_list parameter to specify the subset of columns.

(click to enlarge)




















Once Change Data Capture (CDC) is enabled for a table, several things occur. First, an associated capture instance is created to support the collection of the change data in the source table. The capture instance consists of a change table and up to two query functions.

All inserts, updates, and deletes to the source table are stored in the change table. The two query functions are specific to the source table, and can be used to fetch changes from the change table.

The first five columns of a change data capture change table are metadata columns. These provide additional information that is relevant to the recorded change. The remaining columns mirror the identified captured columns from the source table in name. These columns hold the captured column data that is gathered from the source table.

Each insert or delete operation that is applied to a source table appears as a single row within the change table. The data columns of the row that results from an insert operation contain the column values after the insert. The data columns of the row that results from a delete operation contain the column values before the delete. An update operation requires one row entry to identify the column values before the update, and a second row entry to identify the column values after the update.

In additon to the Change instance, two SQL Agent Jobs are created: one that is used to populate the database change tables, and one that is responsible for change table cleanup. Both jobs consist of a single step that runs a Transact-SQL command. The Transact-SQL command that is invoked is a change data capture defined stored procedure that implements the logic of the job.

(click to enlarge)






















The jobs are created when the first table of the database is enabled for change data capture. The Cleanup Job is always created. The capture job will only be created if there are no existing transactional publications for the database.

Using Change Capture Data


Change data is made available to change data capture prcoesses via the two functions that are created when the change instance is created. Both functions require Log Sequence Numbers (LSNs) to help define the date range of change data to return in the resultset.


The function cdc.fn_cdc_get_all_changes_ returns all changes that occurred for the specified interval. Entries are always returned sorted, first by the transaction commit LSN of the change, and then by a value that sequences the change within its transaction.

The function cdc.fn_cdc_get_net_changes_ returns one change per modified source table row. If more than one change is logged for the row during the specified time period, the column values will reflect the final contents of the row.

Several functions are provided to help determine appropriate LSN values for use in querying the change data capture functions. For example, the functions sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time are available to help place LSN values on a conventional timeline.

To illustrate, insert a single record into sample table created earlier, then update the columns.




































Now we can use the cdc.fn_cdc_get_all_changes to set how the change table is populated:

















Note that the resulting change table contains four records, with the source table columns populated with the table data surrrounding the change. The first record reflects the initial insert. The 2nd and 3rd records reflect the update (a delete followed by new insert), while the fourth record reflects the final delete.
Note that running the cdc.fn_cdc_get_net_changes returns only a single row, which represents the newest version of any modified row. This procedure will only return one change table record per source record changed, regardless of how many changes were made to the source record.



























Business Intelligence and Silverlight

Looking forward to an upcoming book on two of my favorite areas of interest: Business Intelligence and Silverlight.



SQL Server 2005 vs 2008 Query Performance

I was recently working on a transactional web application for a client. This web application had some "operational" type reports included within it that required aggregations of, what will eventually be, millions of rows of data. This meant I'd be writing queries directly against the transactional tables doing the aggregations (SUMs and COUNTs mostly) "on demand".

Note: In more heavily used applications, aggregating as much data as we were on demand might not be acceptable, but in this case we were measuring in transactions "per minute" and "per hour" rather then "per second", so we did not anticipate any blocking of our queries by database writes.

These queries were going to be used on the landing page for web application seen by every employee within the company. The landing page receives 20,000 hits per day over roughly 17 hours of operation a day. This is about 19 hits per minute or 1 hit every 3 seconds. Not all that much compared to many applications I've seen, but definitely enough to warrant some sort of load/performance testing under a larger data load.

I set up some SQL scripts to, somewhat randomly, load a data sample that was representative of about 2 years worth of data. My initial query times were awful, and as I suspected, there was some room for improvement. I did some refactoring (with some help from a friend) and had things running at about 1-2 seconds per query. I thought there was still some room for improvement, but this was within an acceptable range.

Thinking things were fine, I promoted my code from my local development environment to our integration environment. I then ran my data load script to test in that environment, and suprisingly, was seeing query times in the 20-25 second range. WHAT?!? I just tuned things. After some tinkering, and reloading of data, I was seeing the inconsistencies in my query times. Sometimes in the 20-25 second range, other times in the 3-5 second range. What was going on?

My initial thought was that there was something different in the query processing between my local development environment (SQL Server 2008 Developer Edition) and the integration environment (SQL Server 2005 Standard edition). Some quick searches found that I might be using the Star Join Query Optimization available in 2008 Developer/Enterprise editions. There are ways that I could have confirmed this assumption, but I did not spend the time to do so. Regardless if I was or wasn’t using this optimaztion, I still didn’t want to take a chance that there might have been something else in the query processor that was making things more “efficient” on 2008 Developer vs 2005 Standard. I thought it’d be more reliable if I just set up a more representative environment and did my testing/tuning there.

I built a 2005 Standard virtual machine, built my application, loaded sample data, and ran the queries. Started seeing query times similar to our integration environment. OK, so it was reproduceable. Now let's tune it. I went straight to an indexed view I was using for the query, made a minor tweak. As I ran the script to rebuild the indexed view, it hit me. I had just loaded 2 years worth of data, and had done no database maintenance that would "normally" be done gradually, such as rebuilding indexes and defragging the hard drive. I rebuilt my indexes, defragged the drive, and sure enough my query times were back down in the 1-2 second range.

After tweaking just a bit more by including my numeric columns used in my query aggregations in an index, I redeployed to our integration environment and finally started getting consistent and performant results.

A few important take-aways here...

  1. In order to truly gauge/test performance, perform maintenance on your development/test environment just as you would in a “normal” environment, especially after large data loads. That is rebuild your indexes, defrag the hard drive, etc.
  2. Always (or to the extent possible) develop and/or test on machines that are representative of the environment you are actually going to deploy to, especially with regards to versions/editions of applications being used. There are many features within SQL Server that are only available in the Enterprise Edition of the product, including some query optimizations. I usually do this, but for a long period of time on my current project, the client couldn’t give a definitive answer on what our environments would look like, so we continued development with what we originally started with (which was no longer valid). Once I built a more representative environment of what I was actually deploying to, then deployed my application and test data there, I was able to troubleshoot and fix the issue more quickly.

Excel Services Part Deux - Excel Web Services

In an earlier blog entry, I discussed in very broad terms Microsoft Excel Services – a server side calculation engine that can be leveraged for SharePoint Business Intelligence solutions. Excel Services, you may recall, is divided into three components; Excel Calculation Services (ECS), Excel Web Access (EWA), and Excel Web Services (EWS). Excel Web Services allows access to the Excel Calculation Engine via Web Service calls, and is today’s focus.

Excel Web Services utilizes SOAP over http and acts as an interface between custom client programs and Excel Services. In order to call the service, it is necessary to establish a reference to the Excel Web Services WSDL, accessed through ExcelService.asmx?wsdl. However, if you are writing code within SharePoint Services (for example, developing custom web parts), Microsoft recommends linking directly to Microsoft.Office.Excel.Server.WebServices.dll and making local calls rather than using loop-back SOAP calls.

EWS - What is it good for?
Absolutely nothing (say it again…)? No, there are a number of practical examples where utilizing Excel Web Services comes in handy. One common scenario occurs when a company’s business logic is embedded in Excel spreadsheets. These spreadsheets can be maintained on a corporate server, and custom applications developed to utilize the logic in the spreadsheet without having to recode it in a “conventional” programming language. This also means that the spreadsheet logic can be maintained by business Subject Matter Experts (SMEs) who are comfortable working in Excel. Custom code can be developed to set values to cells and ranges, process the workbook, and retrieve calculated values (or even the entire workbook).

Coding Steps 101
Dealing with EWS is relatively straightforward, and numerous code examples can be found in the literature and on the web, such as here: http://blogs.msdn.com/excel/archive/2005/11/21/495454.aspx


Typical steps involved in coding against EWS are as follows:


1. Instantiate the web service via creation of ExcelService object
2. Set the URL via ExcelWebService property
3. Set the credentials via Credentials property. Default credentials (using applications own credentials) can be used by setting this property to System.Net.CredentialCache.DefaultCredentials.
4. Start the Session by calling OpenWorkbook() method
5. Set Cell Parameters via SetCell() method
6. Calculate Spreadsheet (if workbook not set to auto-recalculate) via Calculate() or CalculateWorkbook() method
7. Retrieve the Results via GetCell() method
8. Close Session via call to CloseWorkbook() method


Other methods are available as well, including the ability to set a range of cells and read an entire workbook into memory.

Business Example
An RDA client manages a customer support system that utilizes a custom authorization engine that controls which support agents have access to specific functionality within the application. For example, agents that support a specific product line do not have access to workflows supporting other product lines, and only agents of a certain tier level have the rights to issue refunds to customers. Although this authorization data is stored in a relational database, an Excel Spreadsheet on a SharePoint site is used by both the business and application support personnel to maintain and modify this information. When changes are made, the spreadsheet is loaded into the relational database through custom software.

This approach has a number of benefits:


1. SharePoint is used as a means of version control, to maintain a historical record of changes made.
2. Anyone with access to the SharePoint site can view the details surrounding the current authorization hierarchy, without the need for special software or reports.
3. The spreadsheet itself contains business logic to help organize the information and maintain data integrity.
4. The spreadsheet is easy to use and modify for subject matter experts who may not have a programming background.

However, as initially conceived there are some difficulties associated with this approach:


1. Making changes and deploying to the relational database involves a number of time-intensive manual steps, including downloading the spreadsheet, exporting to XML, applying an XSLT transform, and executing a custom application.
2. In addition to the labor involved, the manual steps are prone to error.
3. In the event that the changes made are not what were desired, either through human error or a change in business strategy, rollback of the changes can be difficult.

Excel Web Services can provide an ideal platform to address these shortcomings. A SharePoint web part or other custom application can be developed that, utilizing Excel Services, can recalculate and read the latest spreadsheet data, generate the necessary output, and load the relational database in (from an end users point of view) a single step. No modifications to the existing spreadsheet logic would be necessary, and the business users can use and maintain the spreadsheet as before. SharePoint security can be leveraged to restrict access to the Web Part. In addition, custom rollback functionality can be added via code.

Finally…

As indicated in my last installment, relying on business logic residing in an Excel spreadsheet may not be considered “ideal” from an architectural standpoint – but is often a necessity when practical considerations such as time, budget, staff experience, and maintainability come into play. Excel Web Services provides an IT organization a valuable tool to leverage that platform for maximum benefit.

SSIS Data Flow Task – OleDBCommand Task on ORACLE database throwing “Provider cannot derive parameter information” error

Problem
We were trying to update a table in Oracle database within a dataflow task. OleDbCommand transformation task is used for the purpose. The update statement takes some parameters. SSIS could not resolve the parameter information and throw error “Provider cannot derive parameter information”. We are using SSIS 2005 and Oracle 10G .NET drivers.

Analysis
Typically, OleDbCommand transformation task automatically resolves the parameters based on the SQL statement. For example, let’s try updating a same table in SQL Server database and Oracle database

SQL Statement: UPDATE Employee SET Bonus = ? WHERE EmpID = ?

With SQL Server
Set the SQL Server connection in “Connection Managers” tab. Fill the SQL statement in the “Component Properties” tab as shown below.


Figure 1 - Adding SQL Statement to OleDbCommand Task

If you check the “Column Mapings” tab in the OleDBCommand transformation task, you would see the parameters list as follows. SSIS automatically resolves the parameters needed. Map the columns appropriately and the Data Transformation task is ready


Figure 2 - Parameters List (SSIS automatically resolved)


With Oracle
When the same SQL is tried on Oracle database, SSIS throws the following exception. SSIS cannot figure out the parameters automatically.

Validation error. Update Oracle Table: UPDATE Employee [60]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "OraOLEDB" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Validation error. Update Oracle Table: UPDATE Employee [60]: Unable to retrieve destination column descriptions from the parameters of the SQL command.


Figure 3 - SSIS failed to resolve the parameters with Oracle database

Solution
Do not know if this is bug in SSIS. But one way to resolve is to add the parameters manually. Go to the “Input and Output Properties” tab and add the parameters manually. Add the parameters with the appropriate data type and in the same order as they appear in the SQL statement. This resolves the error and you can map the input columns to parameters.



Figure 4 - Adding parameters manually


Figure 5 - Parameters added manually


Figure 6 - Map the parameters to Input columns

Reporting Services 2008 Parameter Dependencies & Choosing Multiple Parameter Values

As was the case with prior versions, Reporting Services 2008 allows you to establish parameter dependencies and to allow mutiple parameter values be chosen. This means that parameter values provided to the end user can be dependent upon or driven by prior parameter value choices. In this example we’ll see how this can be done using a very simple AdventureWorks2008 Sales Data example report.

The AdventureWorks Sales Detail Report displays line-item sales details for a given product category, product subcategory and product. Our goal is to allow the end-user to specify the product category from the following available list (Accessories, Bikes, Clothing, Components). Once a Product Category choice is made, the available Product SubCategory choices for the chosen Product Categories are determined and displayed in the Product SubCategory drop-down list. In the screen snapshot below we chose Bikes and Clothing as our Product Category choices. For our Product Subcategory choices we chose Mountain Bikes (Bikes), Road Bikes (Bikes) and Caps (Clothing).


















Notice that our Product Category is displayed along with the Product Subcategory in the following format: Product SubCategory (Product Category)
Our product subcategory choices are below:


Mountain Bikes (Bikes)
Touring Bikes (Bikes)
Caps (Clothing)

This was done just to tell the report viewer what category a subcategory belongs to. This was done because the subcategory / category relationship was not always obvious by name!

Parameters and Dependencies
Our sample report uses two parameters: Product Category and Product SubCategory. The subcategory choices are driven by or dependent upon the product category choice. In this section we’ll see exactly how this is done and also look at how we provide both the category and subcategory names to the end-user.


Product Category Parameter
The Product Category choices are provided by selecting the Name column form the ProductCategory table in the AdventureWorks2008 database. Below is the SQL code to provide the ProductCategory choices:

SELECT distinct
Name as ProductCategoryName
FROM
Production.ProductCategory

Notice that this parameter does not include any parameter dependencies since it is our initial choice. The Report Data window below allows us to define the parameter specifics such as the parameter name, parameter label, parameter data type, where the available values come from, what default value should be used, …
Below we see the Report Data window which allows us to create and fine-tune our parameters.


















If we double-click on our highlighted parameter entry we see the Parameter Name, label and data type specified below. Notice that we also allow end-users to choose more than a single parameter value.





















In the Available Values tab we specify how the parameter choices are driven by a query. Since this is the case, we specify the DataSet name that is used to provide the available Product Category values.



Product SubCategory Parameter

The Product SubCategory choices are provided by selecting both the Name column from the ProductCategory table and the Name column from the ProductSubCategory table both in the AdventureWorks2008 database. Using the SQL below, end-users are provided with a list of SubCategory names with the corresponding category name in the following format:
subcategory (category)

SELECT
PSC.Name + ' (' + PC.Name + ')' as productsubcategorynameandcategoryname,
PSC.Name as ProductSubCategoryName
FROM
Production.ProductSubCategory PSC
INNER JOIN
Production.ProductCategory PC
ON
PSC.ProductCategoryID = PC.ProductCategoryID
WHERE
PC.Name IN ( @ProductCategoryName)
GROUP BY
PSC.Name, PC.Name
ORDER BY
PC.Name, productsubcategorynameandcategoryname

Notice this SQL references the ProductCategory parameter using the ‘@’ parameter designator. (This entry was highlighted) This is like saying : “Provide us with Subcategory values given the product category choice that was already made and stored in the @ProductCategoryname parameter.” This is exactly how one can introduce parameter dependencies in reports -- your SQL code can reference one or more report parameters.

Extracting the Report Data

So far, we've looked at the SQL-code used to create parameter choices. The following SQL Code is used to extact the actual reporting data:

SELECT
P.Name as ProductName,
PC.Name as ProductCategoryName,
PSC.Name as ProductSubCategoryName,
SOD.OrderQty,
SOD.UnitPrice,
SOD.OrderQty * SOD.UnitPrice as LineTotal
FROM
Sales.SalesOrderHeader SOH
INNER JOIN
Sales.SalesOrderDetail SOD
ON
SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN
Production.Product P
ON
SOD.ProductID = P.ProductID
INNER JOIN
Production.ProductSubcategory PSC
ON
PSC.ProductSubcategoryID = P.ProductSubcategoryID
INNER JOIN
Production.ProductCategory PC
ON
PC.ProductCategoryID = PSC.ProductCategoryID
WHERE
PC.Name in ( @ProductCategoryName)
and PSC.Name + ' (' + PC.Name + ')' in
(@productsubcategorynameandcategoryname)
ORDER BY
ProductName,
ProductCategoryName,
ProductSubCategoryName

Notice in the highlighted section we are referencing both the Product Category and the Product SubCategory parameters. In fact, we are actually building the string that include the SubCategory Name and Category Name (i.e. Mountain Bikes (Bikes) ) so this filter matches the choice format the end-user is provided with.

Conclusion
It seems as though things have gotten easier in Reporting Services 2008. I’ve authored reports in both SQL Server 2000 and SQL Server 2005 and I remember using a Transact-SQL function (Split) to slice and dice values from a comma separate string.

SSIS Dynamic Data Driven Extracts

Extracting data from multiple tables: a dynamic SSIS Data-Driven Approach


This blog entry looks at how you can use SSIS to extract data from multiple tables leveraging a ‘source table’ that provides the table names to extract data from. This solution utilizes a For Each Container, Package Variables, Package Expressions applied to a package variabe along with an OLEDB Source whose SQL Extract Command is stored in a package variable.


In our sample scenario we receive sales data in tables from a 3’d party. This data is delivered on an unpredictable schedule and when received we’d rather not have to modify our SSIS solution. Instead, we rely on inserting a new row into the source table that tells SSIS what it needs to know about the new source table.



Our Source Table / data driven approach
Our dynamic approach involves the use of a source table to drive the SSIS Extract process and designate the tables to extract.


The Source Table SSIS utilizes is dbo.SalesSource and its schema is below:

CREATE TABLE [dbo].[SalesSource] (
[SalesSourceID] [int] IDENTITY(1,1) NOT NULL,
[SourceTableName] [varchar](64) NOT NULL,
[ExtractProcessedFlag] [int] default (0) NULL,
[ExtractProcessingOrder] [int] NULL,
[RowsExtracted] [int] NULL,
[DateTimeExtracted] [datetime] NULL
)



Initially, we have 2 source files to work with so, we start out inserting the following two rows into our dbo.SalesSource table (Notice we really only need to supply values for 3 columns). The SSIS package updates the ExtractProcessedFlag, RowsExtracted and DateTimeExtracted columns upon sucessful completion.








The ExtractProcessingOrder column allows us to specify the order in which we would like to extract data from in our Source Tables. The reason this was required was that our source data may very well contain duplicate data. We specify the order in which we extract and load and have our SSIS packages flag the duplicate rows based on a business key and keep only the most recent data from our duplicates. (In our scenario, new data trumps old data)



Let’s take a look at our SSIS Solution and then dissect each component:
























The SSIS package uses the following Package Variables.











As we progress and dissect each SSIS task, we will look at each package variable it utilizes. The SSIS package uses a For Each container to cycle through each row present in the Source Table with the CurrentTable package variable holding the current table that needs to be extracted. Notice that we supplied an initial value for the CurrentTable. This way we can preview data in this table from the OLE DB Source connection we will see in the Data Flow task.


The package is fairly simple. It contains a highest level Sequence Container along with an inner For each Container. The For Each Container specifies an ADO Enumerator as its ‘Collection Type.’ The result set that feeds this container is below is produced by the Exec SQL Find Tables to Extract SSIS Task. The SQL Extract code is below:

select SourceTableName
from dbo.SalesSource
where ExtractProcessedFlag = 0
order by ExtractProcessingOrder


Notice that we are only extracting source tables where the ExractProcessedFlag is set to 0. The SSIS package sets this flag column to 1 when a source table is extracted. This EXEC SQL Task specifies the ResultSet Package Variable to ‘hold’ the results of the above select statement. (ResultSet is of type object)

The For Each Container Specifies a ‘For each ADO Enumerator’ as the Collection type and points the ResultSet package variable as the result set holder. ( Notice the setting for the Enumerator and the ADO object Source variable choice)





















Also, the CurrentTable package variable is specified in the Variable Mappings tab. This allows our CurrentTable variable to essentially hold the current table being processed each time we cycle through the For Each loop. We'll see this package variable used in the expression that assigns a value to the SQLExtractCommand package variable.





















The Data Flow task specifies that the SQL to extact data is present in a Package Variable. While this choice if probably not the most common Data Access Mode choice, it does provide us with the flexiblity we need. This package variable has its value dynamically set using Expressions. If you choose a package variable and display its properties (F4 will provide the properties) you will notice that by default the Evaluate As Expression property is False. This was changed to True as is seen below:



















If you click on the the ellipse in the Expression value, you will be provided with the expression builder dialog box. The SQLExtactCommand’s contents are created using the quoted string along with the CurrentTable package variable as seen below. (Notice I did specify square brackets to surrount the table name since the table name may include non-alphabetical or other strange characters)




















I found it helpful to click on Evaluate Expression and then paste the results into a Management Studio window. This way you can execute and validate the actual code.


The Data Flow Task that specifies the SQLExtractCommand package variable is below. You will notice that you are able to view columns and even preview data because we supplied an initial value for our CurrentTable package variable.























The final step in our SSIS package is a EXEC SQL statement that updates the source table to designate the SourceTable as being sucessfully processed.



UPDATE
dbo.SalesSource
SET
ExtractProcessedFlag = 1,
DateTimeExtracted = getdate(),
RowsExtracted = ?
WHERE
SourceTableName = ?


Parameters used in this update statement include the number of rows extracted from the source table along with the source table name.


Conclusion
What we left with is a SSIS package that extracts the tables to process from a source table. As new source tables arrive, we insert the rows into our source table without having to modify our SSIS package.