Lookup Cache

Lookup transformation supports 3 types of cache.
Full Cache - This brings the reference table into memory - Its good if you have lots of memory. Partial - It adds rows to the cache as they are used - Good if using less memory or the range of rows looked up could be less.
None - Always queries the Database every time - useful in low memory conditions

Users sometimes run into a puzzling situation when they use caching in the lookup. With caching settings set to Full Cache they get no hits - all the rows flow to the error output. However, with caching set to Partial or none, (memory restriction is enabled), things work as expected. The common reason for this is that they are comparing strings which have leading or trailing spaces, for example: “John” with “John “

When caching is Full (memory restriction is off) SSIS builds a hash table for the cache in order to do the comparisons more quickly. The hash values for the strings quoted are, of course, different.
When memory restriction is on, string compares are used and the two quoted strings match.
You can trim incoming datea in the source query or using an expression.
Trim reference data using a SQL query rather then selecting a table in the lookup. SQL queries are better in lookup anyway because you only cache the columns (and rows) you asked for, rather than the entire table.

Lookup is case sensitive, so you may need to force the case for the test to succeed.
Data types must match exactly. The join columns, between the input or source data and the reference data must be of exactly the same type.Lookup does not support the full range of SSIS data types. The following types cannot be used as join columns.

DT_R4
DT_R8
DT_TEXT
DT_NTEXT
DT_IMAGE

If you are migrating DTS packages to SQL Server 2005 or 2008...

… your options include
1. Saving the DTS packages to the new server and executing them directly.
2. Running DTS Migration Wizard and letting it convert the packages to SSIS packages for control flow invoking DTS components for the data flow.
3. Rewriting DTS packages completely in SSIS.

Neither option guarantees a performance gain. In fact, moving data directly from a source to a destination table without much transformation logic you will likely see similar performance for all the options. So unless you are taking advantage of new features in SSIS, you are better off selecting the option that fits best into your operating procedures.

If you continue using DTS packages on the new server, you will certainly need Microsoft SQL Server 2000 DTS Designer Components. That applies to options 1 and 2. With option 2, the Migration Wizard generates SSIS control flow, but creates an embedded DTS package for every Data Pump, instead of converting it to a Data Flow Task. Option 3 will likely require the most amount of work, and that's where the fun begins!

SharePoint SSRS Manage Menus

Issue:
After uploading a Reporting Services Report to a SharePoint site, clicking on the drop down next to the report to bring up Manage Data Sources, Manage Subscriptions, Manage Paramters and Manage Process Options returned the following error:
The resource object with classname of “ReportServer” and key of “DataSourceList” was not found. DataSourceList was replaced with either SubscriptionList, ParameterList or ProcessingList dependent on which menu was selected.

The environment is setup with Two Front End Web Servers and One Central Application SharePoint server.

Answer:
It turns out that it had to do with the Web Front End servers not configured the same as the Central SharePoint server and this is what was causing the error.

The following are the steps that were done to get the menus the work as intended:
It turns out the ReportServer.resx was missing on the two front ends. In IIS on the Web Front Ends, navigate to Websites\Default Web Site\App_GlobalResources and paste a copy of ReportServer.resx which was found in the InetPub folder tree.

The ReportServer.resx is the resource object XML file that has sections for DataSourceList, ParameterList, SubscriptionList, etc., which would explain the error we saw if the file was missing.

In the event that you run into this in the future, checking the Web Front End servers as well as the Central Admin server and ensuring that the files are in synch is a good troubleshooting tactic to solve any strange behavior.

Upcoming BI Seminars in MD, VA, and NC!!!

We are hosting several performance management based BI seminars. More information and registration details are located here: http://www.rdacorp.com/getting_started/events.html#Business_Intelligence_/_Performance_Management_Seminar_Series

Pivot Transformation

Pivot transformation converts rows into columns. I see most of the people are not aware of setting the transformation properties.

Here is a small example.

My source data looks like this




Pivot transformation organizes data as below.



To implement this, add a Data flow task. Take OLE DB Source and select the source table. Add a pivot transformation to it. Select all the three columns in the input columns tab. In the Input and Output Properties tab give 1 (setkey) as the pivot usage for customers, 2 (PivotKey) for product and 3 (Pivotvalue) for Qty. Note down the LineageID of customer and Qty.


In the Pivot Default output add a column for customer and give the SourceColumn as the LineageID of the customer in the input column.


Add columns for product. Since there are 6 products add 6 new columns and give them appropriate names. For ChairQty mention Pivotkeyvalue as Chair and the SourceColumn is the LineageID of the qty input column.



Repeat this for the other 5 columns. Add an OLE DB destination to this pivot transformation. When executed rows will be organized into columns.

Deploy SSRS Reports in SharePoint Integration Mode

This blog addresses deployment of SSRS Reports and Report Models to a SharePoint 3.0 Site. This information can be gleaned from the following MS TechNet article, Deploying Models and Shared Data Sources to a SharePoint Site. The intent is to (hopefully) make the necessary deployment settings for SharePoint Integration Mode more explicit and straightforward.

Assumptions:


  • Report Server has been configured for SharePoint Integration Mode.
  • SharePoint Report Server Integration Feature is activated.
  • Appropriate Content Types (Report Builder, Report Builder Report, Report Data Source) have been added to a Document Library within a site.


For further information on these configuration topics, see Deploying SSRS with SharePoint Integration.

Assume the following server/site/library/folder names:

  • Server Name = MyServer
  • SP Site = SiteABCReports
  • Doc Library = SSRS Reports
  • Data Sources Folder (optional) = Data Sources
  • Report Models Folder (optional) = Models
  • Reports Folder (optional) = Reports


Deploy Report Designer Reports

From within a SQL Server Business Intelligence Studio (BIDS) Report Server Project, go to the project’s Property Page (right-click project in Solution Explorer and select Properties).
Apply the following deployment settings:

  • Overwrite ExistingDataSources = True
  • TargetDataSourceFolder = http://MyServer/SSRS%20Reports/[Data%20Sources]
  • TargetReportFolder = http://MyServer/SSRS%20Reports/[Reports]
  • TargetServerURL =http://MyServer

Where [‘xxx’] denotes optional.

Things to note:


  • A TargetReportFolder must be specified. The folder can be the document library or a folder within the document library.
  • The TargetDataSourceFolder is optional. If one is not specified, the Data Source will be deployed to the TargetReportFolder.
  • For both target properties, if the folder within the document library does not already exist, it will be created upon deployment.
  • Relative paths are not valid.
  • The replacement of the space character with %20

Deploy Report Models

From within a SQL Server Business Intelligence Studio (BIDS) Report Model Project, go to the project’s Property Page (right-click project in Solution Explorer and select Properties).
Apply the following deployment settings:


  • OverwriteExistingDataSources = True
  • TargetDataSourceFolder = http://MyServer/SSRS%20Reports/[Data%20Sources]
  • TargetModelFolder = http://MyServer/SSRS%20Reports/[Models]
  • TargetServerURL = http://MyServer

Note: The TargetModelFolder and TargetDataSourceFolder properties must be set to the document library or folders within the document library.

Throughout this blog, where a folder within a document library is specified, this folder is optional. Folders at this level merely serve to organize the different types of report support files.

Forrester Review of Enterprise BI Platforms

Interesting editorial summarizing Forrester's recent 151-criteria evaluation of all the major Business Intelligence platforms out there. According to Forrester, IBM (Cognos) and SAP (Business Objects) continue to lead the way, while Oracle and SAS have moved into their league. Microsoft and others trail closely behind. It'll be interesting to see how this balance may shift on the heels of Microsoft's recent acquisitions and the release of SQL Server 2008 with a number of critical BI enhancements.

More details here:

http://www.esj.com/business_intelligence/article.aspx?EditorialsID=9070

Free E-Learning Available for SQL Server 2008

Microsoft Learning has made available 3 hours of web casts outlining the new features in SQL Server 2008.  It consists of 3, 1-hour web casts covering the following areas:

No need to register for each individually, however.  You can get access to the whole collection by clicking Collection 6187: What's New in Microsoft SQL Server 2008 and then clicking the "Activate Free Content" button.

Free SQL Server 2008 E-Book Available

Those of you who receive the MSDN Flash newsletter may already know about this, but there is a free e-book on SQL Server 2008 available.  The book is not complete, yet, so what is available today is only a couple of chapters from Introducing SQL Server 2008 (ISBN: 9780735625587).

When you click on this link you can get immediate access to Chapter 1: Policy-Based Management.

Also available at that link is the option to sign up for a Microsoft Press newsletter (you need a Windows Live ID to sign up).  If you sign up you can also access Chapter 11: Transact-SQL Enhancement.

Either way, bookmark that page and revisit the site later to get access to the full contents of the e-book.  The site states that it will be available in "Late Q1 2008", but that is most likely a typo.  I'd expect Q1 of 2009.

SSIS Dynamic File Name Load

Scenario
An outside vendor is supplying flat files as part of the nightly load process into the system. The file that the vendor will be supplying will have a date attached to the file name. In addition, it is possible for the vendor to send multiple files over at the same time for processing. One important note, the files being sent will have the same structure for processing but the names of the files will differ on each run.

The issue for the processing is to figure out how to process the files into the system since hard coding the input file name to be loaded is not a viable option.

Solution
Take advantage of the For Each Loop Container and the Script task to process this scenario very easily.

Control Flow
The following screen print illustrates the setup of the SSIS package that will handle the processing of the vendor files:



















-

In addition, here are the Connections associated with the above package Control Flow:









Task Breakdown
For this specific scenario, the Vendor is supplying their files via FTP. There is nothing exciting about this task, the task is setup to read the FTP address, grab all files in a specific location and move the files to a destination on my network. The connections being used are the Source Files connection, which is the destination location the network, and the Vendor FTP Connection.

The FTP scenario is not a requirement to receive the files, it is just a simple method to move files from an outside location onto the network and the point of this blog is not to go into details of the FTP task.

Once the files have been moved into the Source folder for processing, the For Each Loop Container is the main starting point of the dynamic processing. In the For Each Loop container there are seven different collection sets available for processing, the one that this process will use is the ForEach File Enumerator. The following is a screen shot of the For Each Loop Collection setup:



















The specific property to note is the Retrieve file name options. In this scenario, the For Each Loop Container will be retrieving the Fully Qualified path of the file found.

When a file is found in the source folder, a variable will be populated with the file path. In the package, a User Variable called FilePath was created for the storage of the file path. The variable is declared in the Variable Mappings page, which is displayed below:



















The container is now setup to loop through the collection of files in the source path. Now that we can loop through all files being sent from the vendor, the issue remaining is how to map each file found in the collection into the SQL Server database.

In the Control Flow diagram, there is a Script task and a Data Flow task inside of the For Each Loop Container, which will handle the processing of each file. The Script Task will be used to update the Data Flow task with the file to be processed but before detailing that task, the setup of the Data Flow task is needed to understand how the Script task will be used.

The Data Flow task is setup with a Flat File Source into a SQL Destination. The Flat File source was setup originally with a sample file from the Vendor to get the Meta Data about the column types into the process so a mapping can occur between the Flat File and the SQL Server table. The Flat File sample file is the Loop_Load_File Connection in the Connection Manager seen above. The following is the sample Data Flow:














For this scenario, it is a really simple load into table. However, this Data Flow has the Loop Source connected directly to the Loop_Load_File Connection, which can only contain one name. This is where the Script task becomes handy into changing the name of the file to be loaded.

The Script task has the FilePath user variable as a ReadOnlyVariable for the script input. The script design changes the Loop_Load_File Connection to be the path stored within the variable:
Public Sub Main()
Try
'dynamically set the connection to the found file
Dts.Connections("Loop_Load_File").ConnectionString = Dts.Variables("FilePath").Value.ToString

'return success
Dts.TaskResult = Dts.Results.Success
Catch
'error occurred, return failure
Dts.TaskResult = Dts.Results.Failure
End Try

End Sub

By processing this script before the calling of the Data Flow task, the Flat File connection has been updated to point to the file loaded from the collection and therefore allows the Data Flow task to connect to each file without having to hard code any file names in the process.

Lastly, in the Control Flow, there is an Archive script that moves the file from the Source folder into an Archive folder, which is a rather simple process.

Conclusion
The presented steps displayed a simple approach to processing Flat Files with different files names, but of the same structure, into a SQL Server destination with very minimal script needed and taking advantage of the power of the For Each Loop Container.

Decrypting the encrypted DTSRun /~Z command-line parameters

by Joe Toscano, Senior Software Engineer, RDA Corporation

Many of us have created Data Transformation Services (DTS) packages and then right-clicked on them to schedule their execution via a SQL Server 2000 Job. When you do this, a job is created whose name matches the DTS package you right-clicked on from Enterprise Manager. This job will be a single-step job that contains an Operating System Command to execute DTSRun which essentially kick-starts the package. The potential problem here is the command line parameters to DTSRun in the job step are actually encrypted.

To illustrate this I created and saved a package appropriately named Very simple DTS package. Then I right-clicked on the package from Enterprise Manager and choose Schedule Package from the context menu. Once this is done, if you look at actual job created you will find that the job name actually matches the package name. Below we see our single-step job that uses the DTSRun command with a very long encrypted string after the /~Z option.

While some folks may be fine with this and view the encryption as an extra layer of security, others may want to see the DTSRun parameters that provide both the server name that houses the DTS package along with the actual package name being executed. (Keep in mind your jobs can reference packages that reside on other instances of SQL Server 2000!) I’d like to focus on the latter audience for reasons stated below.

Why would you need to see the DTSRun cleartext parameters?

I’ve worked on several SQL Server 2000 to 2005 migration projects in which we needed to unravel or decrypt the string to see exactly what package is being executed and what server that package is located. (Specifically, we wanted to see the \S Server Name and \N Package Name DTSRun options) At one site, we notice that several copies of our DTS packages were made over the years whose names seemed very close to the originals. Frankly, it made us nervous that we saw the original DTS package saved with the ‘Original’ tag as part of the package name along with a modified copy of this DTS package saved with the ‘New’ tag as part of its name. Given this, how could be we 100% confident that the job name actually told us the exact package that was executed? Another possibility we had to consider what that over the years administrators could have changed the name of jobs! Finally, how could we even be sure that our jobs executed packages on the same server as our jobs were created and not reference packages on some other server? For all of these reasons (and our peace of mind) we decided to decrypt the command line parameters. After doing a bit of research, we found two methods to accomplish this:

1. Find the decrypt code

This shouldn’t come as a surprise, but if you search in the right places you can find the decrypt C-code that accepts the encrypted string (the stuff AFTER the /~Z) as a parameter and spits out the DTSRun parameters in cleartext. I’m not going down this road and I would certainly NOT recommend anyone else does so.

2. Using DTSRun options to produce the cleartext parameters.

It turns out there are command line options of DTSRun that don’t actually execute the DTS Package, but instead produce the cleartext parameters. Below are the steps to accomplish this for our sample package and sample job described above:

1. Copy the DTSRun command line from the job step (including the very long encrypted string!)

2. Open a Windows Command Line Window and paste the command into it.

3. Add /!X and /!C to the end of the DTSRun command. (/!X says do not run and /!C says copy results into clipboard)

4. Execute the command. (You should see DTSRun: Loading … and DTSRun: Executing …. as is displayed below and the cleartext parameters should now be in your paste buffer)

5. Paste the cleartext parameter string into notepad.

When I performed these steps for my sample package and sample job the following was returned:

DTSRun /S "JOETSRDAPC\SQL2K_DE_I1" /N "Very simple DTS package" /E /!X /!C

This tells me exactly what I was looking for! The Server Name that houses the package and the package name that is being executed. (/E says use trusted authentication) Using this method we were able to migrate both our DTS packages and jobs to our new server knowing for sure which jobs executed which packages. In fact, our client preferred to substitute the /~Z encrypted string in each job with the cleartext parameters to avoid future problems.

Virtual PC Tips - Using SYSPREP

Virtual PC is useful for creating development and test environments where a number of server products are required. Take for instance setting up a development environment for a BI engagement. You may need SQL Server 2005, Reporting Services, Analysis Services, MOSS and possibly even Performance Point. If your development environment consists of a single laptop, Virtual PC is something you want to check out. As an aside you can get decent performance when running 2-3 Virtual PCs with 4 GB of RAM.

Microsoft has a tool called SYSPREP which you can use to clone an existing VHD file that you've prepared. This allows you to create a Virtual PC image that you can make a copy of and use. SYSPREP can take care of generating a computer name and a unique SID so that you won't "collide" with existing Virtual PC images.

I'll provide an example of using SYSPREP with a Windows 2003 Server Virtual PC Image. Open the Deploy.cab file on the Windows 2003 Server CD from the Support\Tools folder (just double click in Windows Explorer). Extract the contents to a folder (e.g. C:\DEPLOY; highlight the files in Deploy.cab, right click, then select Extract). When you are running a Virtual PC image, you can click the CD menu option, select Capture ISO Image, and point to your Windows 2003 Server .iso image file; then use Windows Explorer to access the Deploy.cab file.

Run Setupmgr.EXE (from the folder where you copied the Deploy.cab contents) to create an "answer" file; you'll proceed through a series of dialogs that record your answers to certain configuration questions to be answered when you launch a VPC that you created by copying a sysprepped VHD file. The dialogs are pretty self-explanatory; make sure to select Sysprep Setup on the Type of Setup screen.

After running Setupmgr.EXE your Virutal PC will shutdown. At this point delete the .VMC file and save the .VHD file. You will no longer launch the Virtual PC image; when you want a new Virtual PC you create a new virtual machine and use a copy of the .VHD file. The first time you launch a new Virtual PC image, SYSPREP will do it's magic and you'll have a working VPC image.

SSIS DateTime Variable Truncates Milliseconds

I have a number of SSIS packages that extract data from a SQL Server database then populate a data warehouse. In order to extract just the rows that changed since the last extract, I save the maximum last modified date for each source table in a table. The extract SSIS package will retrieve the maximum last modified date and use it in the WHERE clause to select just the rows that have changed.

When the last modified date column includes milliseconds, assigning it to an SSIS variable of type DateTime seems to truncate the milliseconds, occasionally causing some rows to get extracted that haven't actually changed.

To workaround this problem, I retrieve the last modified date as a string; e.g.

SELECT CONVERT(VARCHAR(23), LastModified, 121) LastModified
FROM ...

Then in the WHERE clause to extract the rows that have changed since the last extract, I convert the string back to a DateTime; e.g.
WHERE LastModified > CONVERT(DATETIME, @LastModified, 121)

This retains the millisecond portion of the DateTime and I don't pickup rows that haven't actually changed.

SSIS Conditional Control Flow

For some SSIS packages, execution is dependent on other objects being readily available before the path of the processing occurs. Otherwise, a different action is then taken if that object is not available. The issue then becomes how to handle this logic in SSIS.

The answer is to take advantage of Expressions in Precedence Constraints. When Control Flow Items are connected to each other in the package, often the default is to evaluate the Success Result of the task being executed before the next task is processed. However, for this scenario, we want the initial task to succeed but the package also must know what happened before going forward.

I have created an example package that evaluates if a File is located on the network. If the File is located, the package will then perform a load into the Database. If the file is not present, an email is sent out to a user group stating that the file was not found.

The following is the screen print of the Control Flow for this package:















The Control Flow looks simple enough but if you look at the precedence constraints, you will notice the function symbol on both of the precedence constraints to alert the developer that an Expression is contained within them. The discussion of the precedence constraints is a little later but first let’s setup the rest of the package to understand what is happening before the constraint is evaluated.

In the Variables collection for the package, there are two variables setup:
· Variable “sInputPath” – the sInputPath variable contains the location of the file being checked for its existence.
· Variable “bIsAvailable” – the bIsAvailable variable will contain the result from the Script task to state whether or not the sInputPath file was found.


After the variables have been declared, the variables are passed into the Script Task in SSIS. The sInputPath variable is passed as a ReadOnly into the Script Task whereas the bIsAvailable variable is passed as a ReadWrite variable into the Script Task. The following code takes the input path and checks for the file existence and then sets the Boolean variable accordingly:

Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO


Public Class ScriptMain

Public Sub Main()
Try
'check to see if the file is available and set the boolean variable
Dts.Variables("bIsAvailable").Value = File.Exists(Dts.Variables("sInputPath").Value.ToString)

'return success
Dts.TaskResult = Dts.Results.Success
Catch
'an error occurred, return failure
Dts.TaskResult = Dts.Results.Failure
End Try

End Sub

End Class


After the Script task executes, both Precedence Constraints are evaluated to see if the tasks associated will be executed or not.

The following screen print is the precedence constraint for the Import Data dataflow task:
















The Precedence Constraint Editor is setup to not only evaluate the value of the preceding task, which in this case is Success, but also an Expression that must be in the form of a Boolean return. For the example above, the bIsAvailable variable is being evaluated to see if this variable returns True.
Note: this expression is checking for True because it is a Boolean variable check, if we wanted to check to see if a variable returned 3, we would set this expression to read “@[User::nCount]==3” where a package variable of nCount was declared to evaluate.

For the example above, a Logical AND is selected to state that all conditions must be met. The Logical OR is available as well if only one of the conditions must be true.


As illustrated above, taking advantage of Expressions in Precedence Constraints will allow for a pseudo dynamic processing of a SSIS package based upon the different outside variables within a network.

ACQUIRECONNECTION Error after moving SSIS packages from one server to another.

When you manually move raw SSIS Package files from one server (say DEV) to another (say TEST) and then import the package into Integration Services and then run the packages, you sometimes get a CANNOTACQUIRECONNECTION error. When you click on the Connection Managers, you find all your connection strings assigned values -- that you specified on DEV and the package is not able to read the new Connection Strings that you specified in your new SQL Server Configuration DB on TEST.

Possible Cause:

Your package is probably using Indirect Configurations to get Connection Strings from a SQL Server Configuration DB or an XML File and the name of this Configuration DB or XML File is fetched from an Environment Variable on your server. You probably created those Environment Variables AFTER you dropped your packages on the new server.

Resolution:
Always create the Environment variables on the new server BEFORE you bring any packages either over the network or from Source Control Repository. The packages do not get mapped to the Environment Variables during Run Time but when you drop it on your server is when they get mapped. If you did this mistake like I did and wondering how to resolve it – simply bring in all those SSIS Packages yet again on to your new box (but do make sure you have your Environment Variables created) and everything should flow smoothly from here onwards

Creating XL Report in the same workbook using SSIS

I had a requirement to create XL reports on the 2 sheets of the XL work book everyday and place them in a given location on the server using SSIS. All these reports should be of the same pattern i.e, I have a template with 2 sheets, headers and colors and all the reports should look the same. Though the requirement looks simple, I had to work around to create the reports with the same workbook and headers using SSIS.

I can illustrate with a small example I have created. First of all I created a template with 2 sheets Orders and Hist where Orders will have the no.of orders of the employee and Hist has all the employee details with headers. Name this as Template.xls. Create another copy of the template and name it as Template2. Place them in the same location.


So my template looks this way.




In the SSIS package, create a File connection to point to Template.xls.




Create another File connection for Template1.xls.


Create the File System Task and copy file Template.xls to Template1.xls. This should be your first task before you try to create a report.


Now add a script task to include logic to create filename. Here we need to include a variable to assign the query to pull the data from table.


For his I have added a variable in the SSIS package and assigned a select statement to it. Since I have 3 headers in my report, I have assigned the below statement.


Create a data flow task for orders to pull the data from Orders table.

Since I would be using the variable vQuery as the source in the dataflow, I have assigned a default statement to the vQuery variable at design time to validate my Source in the dataflow. This variable is over written in the script task where I had given the logic to create filename. Add an Excel Destination. Select Orders$ for the name of the XL sheet.





Add another data flow for Employee details which I named as Hist. Just like orders XL destination, select Hist$ as the name of the XL sheet for Employee details.



Add another File System task after the 2 Dataflow tasks to copy the file Template1.xls to the file we have created before in the script task.


This package will create an XL report and place them to the server location.

When I executed the package I got results as below.




SharePoint vs PerformancePoint for Performance Management

Here's a an issue that has come up for me already on multiple occasions with potential clients: Microsoft has SharePoint whose capabilities for publishing dashboards, reports and other performance data is well-documented. Then here comes PerformancePoint, the somewhat new kid on the block, with its Dashboard Designer tool and simple integration with SharePoint.

Here's a very recent, well-organized blog posting I came across that compares SharePoint and PerformancePoint's performance management capabilities. 'Which one should I use?' is a questions I'm sure many of us will face as PerformancePoint gains more acceptance and momentum.

http://blogs.msdn.com/normbi/default.aspx