Friday, August 29, 2008
Lookup Cache
Full Cache - This brings the reference table into memory - Its good if you have lots of memory. Partial - It adds rows to the cache as they are used - Good if using less memory or the range of rows looked up could be less.
None - Always queries the Database every time - useful in low memory conditions
Users sometimes run into a puzzling situation when they use caching in the lookup. With caching settings set to Full Cache they get no hits - all the rows flow to the error output. However, with caching set to Partial or none, (memory restriction is enabled), things work as expected. The common reason for this is that they are comparing strings which have leading or trailing spaces, for example: “John” with “John “
When caching is Full (memory restriction is off) SSIS builds a hash table for the cache in order to do the comparisons more quickly. The hash values for the strings quoted are, of course, different.
When memory restriction is on, string compares are used and the two quoted strings match.
You can trim incoming datea in the source query or using an expression.
Trim reference data using a SQL query rather then selecting a table in the lookup. SQL queries are better in lookup anyway because you only cache the columns (and rows) you asked for, rather than the entire table.
Lookup is case sensitive, so you may need to force the case for the test to succeed.
Data types must match exactly. The join columns, between the input or source data and the reference data must be of exactly the same type.Lookup does not support the full range of SSIS data types. The following types cannot be used as join columns.
DT_R4
DT_R8
DT_TEXT
DT_NTEXT
DT_IMAGE
If you are migrating DTS packages to SQL Server 2005 or 2008...
1. Saving the DTS packages to the new server and executing them directly.
2. Running DTS Migration Wizard and letting it convert the packages to SSIS packages for control flow invoking DTS components for the data flow.
3. Rewriting DTS packages completely in SSIS.
Neither option guarantees a performance gain. In fact, moving data directly from a source to a destination table without much transformation logic you will likely see similar performance for all the options. So unless you are taking advantage of new features in SSIS, you are better off selecting the option that fits best into your operating procedures.
If you continue using DTS packages on the new server, you will certainly need Microsoft SQL Server 2000 DTS Designer Components. That applies to options 1 and 2. With option 2, the Migration Wizard generates SSIS control flow, but creates an embedded DTS package for every Data Pump, instead of converting it to a Data Flow Task. Option 3 will likely require the most amount of work, and that's where the fun begins!
Tuesday, August 26, 2008
SharePoint SSRS Manage Menus
After uploading a Reporting Services Report to a SharePoint site, clicking on the drop down next to the report to bring up Manage Data Sources, Manage Subscriptions, Manage Paramters and Manage Process Options returned the following error:
The resource object with classname of “ReportServer” and key of “DataSourceList” was not found. DataSourceList was replaced with either SubscriptionList, ParameterList or ProcessingList dependent on which menu was selected.
The environment is setup with Two Front End Web Servers and One Central Application SharePoint server.
Answer:
It turns out that it had to do with the Web Front End servers not configured the same as the Central SharePoint server and this is what was causing the error.
The following are the steps that were done to get the menus the work as intended:
It turns out the ReportServer.resx was missing on the two front ends. In IIS on the Web Front Ends, navigate to Websites\Default Web Site\App_GlobalResources and paste a copy of ReportServer.resx which was found in the InetPub folder tree.
The ReportServer.resx is the resource object XML file that has sections for DataSourceList, ParameterList, SubscriptionList, etc., which would explain the error we saw if the file was missing.
In the event that you run into this in the future, checking the Web Front End servers as well as the Central Admin server and ensuring that the files are in synch is a good troubleshooting tactic to solve any strange behavior.
Monday, August 18, 2008
Upcoming BI Seminars in MD, VA, and NC!!!
Wednesday, August 13, 2008
Pivot Transformation
Here is a small example.
My source data looks like this

Pivot transformation organizes data as below.

To implement this, add a Data flow task. Take OLE DB Source and select the source table. Add a pivot transformation to it. Select all the three columns in the input columns tab. In the Input and Output Properties tab give 1 (setkey) as the pivot usage for customers, 2 (PivotKey) for product and 3 (Pivotvalue) for Qty. Note down the LineageID of customer and Qty.


Repeat this for the other 5 columns. Add an OLE DB destination to this pivot transformation. When executed rows will be organized into columns.
Deploy SSRS Reports in SharePoint Integration Mode
Assumptions:
- Report Server has been configured for SharePoint Integration Mode.
- SharePoint Report Server Integration Feature is activated.
- Appropriate Content Types (Report Builder, Report Builder Report, Report Data Source) have been added to a Document Library within a site.
For further information on these configuration topics, see Deploying SSRS with SharePoint Integration.
Assume the following server/site/library/folder names:
- Server Name = MyServer
- SP Site = SiteABCReports
- Doc Library = SSRS Reports
- Data Sources Folder (optional) = Data Sources
- Report Models Folder (optional) = Models
- Reports Folder (optional) = Reports
Deploy Report Designer Reports
From within a SQL Server Business Intelligence Studio (BIDS) Report Server Project, go to the project’s Property Page (right-click project in Solution Explorer and select Properties).
Apply the following deployment settings:
- Overwrite ExistingDataSources = True
- TargetDataSourceFolder = http://MyServer/SSRS%20Reports/[Data%20Sources]
- TargetReportFolder = http://MyServer/SSRS%20Reports/[Reports]
- TargetServerURL =http://MyServer
Where [‘xxx’] denotes optional.
Things to note:
- A TargetReportFolder must be specified. The folder can be the document library or a folder within the document library.
- The TargetDataSourceFolder is optional. If one is not specified, the Data Source will be deployed to the TargetReportFolder.
- For both target properties, if the folder within the document library does not already exist, it will be created upon deployment.
- Relative paths are not valid.
- The replacement of the space character with %20
Deploy Report Models
From within a SQL Server Business Intelligence Studio (BIDS) Report Model Project, go to the project’s Property Page (right-click project in Solution Explorer and select Properties).
Apply the following deployment settings:
- OverwriteExistingDataSources = True
- TargetDataSourceFolder = http://MyServer/SSRS%20Reports/[Data%20Sources]
- TargetModelFolder = http://MyServer/SSRS%20Reports/[Models]
- TargetServerURL = http://MyServer
Note: The TargetModelFolder and TargetDataSourceFolder properties must be set to the document library or folders within the document library.
Throughout this blog, where a folder within a document library is specified, this folder is optional. Folders at this level merely serve to organize the different types of report support files.
Tuesday, August 12, 2008
Forrester Review of Enterprise BI Platforms
More details here:
http://www.esj.com/business_intelligence/article.aspx?EditorialsID=9070
Free E-Learning Available for SQL Server 2008
Microsoft Learning has made available 3 hours of web casts outlining the new features in SQL Server 2008. It consists of 3, 1-hour web casts covering the following areas:
- Clinic 6188: What's New in Microsoft SQL Server 2008 for Enterprise Data Platform
- Clinic 6189: What's New in Microsoft SQL Server 2008 for Business Intelligence
- Clinic 6190: What's New in Microsoft SQL Server 2008 for Database Development
No need to register for each individually, however. You can get access to the whole collection by clicking Collection 6187: What's New in Microsoft SQL Server 2008 and then clicking the "Activate Free Content" button.
Free SQL Server 2008 E-Book Available
Those of you who receive the MSDN Flash newsletter may already know about this, but there is a free e-book on SQL Server 2008 available. The book is not complete, yet, so what is available today is only a couple of chapters from Introducing SQL Server 2008 (ISBN: 9780735625587).
When you click on this link you can get immediate access to Chapter 1: Policy-Based Management.
Also available at that link is the option to sign up for a Microsoft Press newsletter (you need a Windows Live ID to sign up). If you sign up you can also access Chapter 11: Transact-SQL Enhancement.
Either way, bookmark that page and revisit the site later to get access to the full contents of the e-book. The site states that it will be available in "Late Q1 2008", but that is most likely a typo. I'd expect Q1 of 2009.
Friday, August 08, 2008
SSIS Dynamic File Name Load
An outside vendor is supplying flat files as part of the nightly load process into the system. The file that the vendor will be supplying will have a date attached to the file name. In addition, it is possible for the vendor to send multiple files over at the same time for processing. One important note, the files being sent will have the same structure for processing but the names of the files will differ on each run.
The issue for the processing is to figure out how to process the files into the system since hard coding the input file name to be loaded is not a viable option.
Solution
Take advantage of the For Each Loop Container and the Script task to process this scenario very easily.
Control Flow
The following screen print illustrates the setup of the SSIS package that will handle the processing of the vendor files:
-
In addition, here are the Connections associated with the above package Control Flow:
Task Breakdown
For this specific scenario, the Vendor is supplying their files via FTP. There is nothing exciting about this task, the task is setup to read the FTP address, grab all files in a specific location and move the files to a destination on my network. The connections being used are the Source Files connection, which is the destination location the network, and the Vendor FTP Connection.
The FTP scenario is not a requirement to receive the files, it is just a simple method to move files from an outside location onto the network and the point of this blog is not to go into details of the FTP task.
Once the files have been moved into the Source folder for processing, the For Each Loop Container is the main starting point of the dynamic processing. In the For Each Loop container there are seven different collection sets available for processing, the one that this process will use is the ForEach File Enumerator. The following is a screen shot of the For Each Loop Collection setup:
The specific property to note is the Retrieve file name options. In this scenario, the For Each Loop Container will be retrieving the Fully Qualified path of the file found.
When a file is found in the source folder, a variable will be populated with the file path. In the package, a User Variable called FilePath was created for the storage of the file path. The variable is declared in the Variable Mappings page, which is displayed below:
The container is now setup to loop through the collection of files in the source path. Now that we can loop through all files being sent from the vendor, the issue remaining is how to map each file found in the collection into the SQL Server database.
In the Control Flow diagram, there is a Script task and a Data Flow task inside of the For Each Loop Container, which will handle the processing of each file. The Script Task will be used to update the Data Flow task with the file to be processed but before detailing that task, the setup of the Data Flow task is needed to understand how the Script task will be used.
The Data Flow task is setup with a Flat File Source into a SQL Destination. The Flat File source was setup originally with a sample file from the Vendor to get the Meta Data about the column types into the process so a mapping can occur between the Flat File and the SQL Server table. The Flat File sample file is the Loop_Load_File Connection in the Connection Manager seen above. The following is the sample Data Flow:
For this scenario, it is a really simple load into table. However, this Data Flow has the Loop Source connected directly to the Loop_Load_File Connection, which can only contain one name. This is where the Script task becomes handy into changing the name of the file to be loaded.
The Script task has the FilePath user variable as a ReadOnlyVariable for the script input. The script design changes the Loop_Load_File Connection to be the path stored within the variable:
Public Sub Main()
Try
'dynamically set the connection to the found file
Dts.Connections("Loop_Load_File").ConnectionString = Dts.Variables("FilePath").Value.ToString
'return success
Dts.TaskResult = Dts.Results.Success
Catch
'error occurred, return failure
Dts.TaskResult = Dts.Results.Failure
End Try
End Sub
By processing this script before the calling of the Data Flow task, the Flat File connection has been updated to point to the file loaded from the collection and therefore allows the Data Flow task to connect to each file without having to hard code any file names in the process.
Lastly, in the Control Flow, there is an Archive script that moves the file from the Source folder into an Archive folder, which is a rather simple process.
Conclusion
The presented steps displayed a simple approach to processing Flat Files with different files names, but of the same structure, into a SQL Server destination with very minimal script needed and taking advantage of the power of the For Each Loop Container.
Thursday, August 07, 2008
Decrypting the encrypted DTSRun /~Z command-line parameters
by Joe Toscano, Senior Software Engineer, RDA Corporation
Many of us have created Data Transformation Services (DTS) packages and then right-clicked on them to schedule their execution via a SQL Server 2000 Job. When you do this, a job is created whose name matches the DTS package you right-clicked on from Enterprise Manager. This job will be a single-step job that contains an Operating System Command to execute DTSRun which essentially kick-starts the package. The potential problem here is the command line parameters to DTSRun in the job step are actually encrypted.
To illustrate this I created and saved a package appropriately named Very simple DTS package. Then I right-clicked on the package from Enterprise Manager and choose Schedule Package from the context menu. Once this is done, if you look at actual job created you will find that the job name actually matches the package name. Below we see our single-step job that uses the DTSRun command with a very long encrypted string after the /~Z option.
While some folks may be fine with this and view the encryption as an extra layer of security, others may want to see the DTSRun parameters that provide both the server name that houses the DTS package along with the actual package name being executed. (Keep in mind your jobs can reference packages that reside on other instances of SQL Server 2000!) I’d like to focus on the latter audience for reasons stated below.
Why would you need to see the DTSRun cleartext parameters?
I’ve worked on several SQL Server 2000 to 2005 migration projects in which we needed to unravel or decrypt the string to see exactly what package is being executed and what server that package is located. (Specifically, we wanted to see the \S Server Name and \N Package Name DTSRun options) At one site, we notice that several copies of our DTS packages were made over the years whose names seemed very close to the originals. Frankly, it made us nervous that we saw the original DTS package saved with the ‘Original’ tag as part of the package name along with a modified copy of this DTS package saved with the ‘New’ tag as part of its name. Given this, how could be we 100% confident that the job name actually told us the exact package that was executed? Another possibility we had to consider what that over the years administrators could have changed the name of jobs! Finally, how could we even be sure that our jobs executed packages on the same server as our jobs were created and not reference packages on some other server? For all of these reasons (and our peace of mind) we decided to decrypt the command line parameters. After doing a bit of research, we found two methods to accomplish this:
1. Find the decrypt code
This shouldn’t come as a surprise, but if you search in the right places you can find the decrypt C-code that accepts the encrypted string (the stuff AFTER the /~Z) as a parameter and spits out the DTSRun parameters in cleartext. I’m not going down this road and I would certainly NOT recommend anyone else does so.
2. Using DTSRun options to produce the cleartext parameters.
It turns out there are command line options of DTSRun that don’t actually execute the DTS Package, but instead produce the cleartext parameters. Below are the steps to accomplish this for our sample package and sample job described above:
1. Copy the DTSRun command line from the job step (including the very long encrypted string!)
2. Open a Windows Command Line Window and paste the command into it.
3. Add /!X and /!C to the end of the DTSRun command. (/!X says do not run and /!C says copy results into clipboard)
4. Execute the command. (You should see DTSRun: Loading … and DTSRun: Executing …. as is displayed below and the cleartext parameters should now be in your paste buffer)
5. Paste the cleartext parameter string into notepad.
When I performed these steps for my sample package and sample job the following was returned:
DTSRun /S "JOETSRDAPC\SQL2K_DE_I1" /N "Very simple DTS package" /E /!X /!C
This tells me exactly what I was looking for! The Server Name that houses the package and the package name that is being executed. (/E says use trusted authentication) Using this method we were able to migrate both our DTS packages and jobs to our new server knowing for sure which jobs executed which packages. In fact, our client preferred to substitute the /~Z encrypted string in each job with the cleartext parameters to avoid future problems.
Wednesday, August 06, 2008
Virtual PC Tips - Using SYSPREP
Microsoft has a tool called SYSPREP which you can use to clone an existing VHD file that you've prepared. This allows you to create a Virtual PC image that you can make a copy of and use. SYSPREP can take care of generating a computer name and a unique SID so that you won't "collide" with existing Virtual PC images.
I'll provide an example of using SYSPREP with a Windows 2003 Server Virtual PC Image. Open the Deploy.cab file on the Windows 2003 Server CD from the Support\Tools folder (just double click in Windows Explorer). Extract the contents to a folder (e.g. C:\DEPLOY; highlight the files in Deploy.cab, right click, then select Extract). When you are running a Virtual PC image, you can click the CD menu option, select Capture ISO Image, and point to your Windows 2003 Server .iso image file; then use Windows Explorer to access the Deploy.cab file.
Run Setupmgr.EXE (from the folder where you copied the Deploy.cab contents) to create an "answer" file; you'll proceed through a series of dialogs that record your answers to certain configuration questions to be answered when you launch a VPC that you created by copying a sysprepped VHD file. The dialogs are pretty self-explanatory; make sure to select Sysprep Setup on the Type of Setup screen.
After running Setupmgr.EXE your Virutal PC will shutdown. At this point delete the .VMC file and save the .VHD file. You will no longer launch the Virtual PC image; when you want a new Virtual PC you create a new virtual machine and use a copy of the .VHD file. The first time you launch a new Virtual PC image, SYSPREP will do it's magic and you'll have a working VPC image.
Monday, August 04, 2008
SSIS DateTime Variable Truncates Milliseconds
When the last modified date column includes milliseconds, assigning it to an SSIS variable of type DateTime seems to truncate the milliseconds, occasionally causing some rows to get extracted that haven't actually changed.
To workaround this problem, I retrieve the last modified date as a string; e.g.
SELECT CONVERT(VARCHAR(23), LastModified, 121) LastModified
FROM ...
Then in the WHERE clause to extract the rows that have changed since the last extract, I convert the string back to a DateTime; e.g.
WHERE LastModified > CONVERT(DATETIME, @LastModified, 121)
This retains the millisecond portion of the DateTime and I don't pickup rows that haven't actually changed.
Friday, August 01, 2008
SSIS Conditional Control Flow
The answer is to take advantage of Expressions in Precedence Constraints. When Control Flow Items are connected to each other in the package, often the default is to evaluate the Success Result of the task being executed before the next task is processed. However, for this scenario, we want the initial task to succeed but the package also must know what happened before going forward.
I have created an example package that evaluates if a File is located on the network. If the File is located, the package will then perform a load into the Database. If the file is not present, an email is sent out to a user group stating that the file was not found.
The following is the screen print of the Control Flow for this package:
The Control Flow looks simple enough but if you look at the precedence constraints, you will notice the function symbol on both of the precedence constraints to alert the developer that an Expression is contained within them. The discussion of the precedence constraints is a little later but first let’s setup the rest of the package to understand what is happening before the constraint is evaluated.
In the Variables collection for the package, there are two variables setup:
· Variable “sInputPath” – the sInputPath variable contains the location of the file being checked for its existence.
· Variable “bIsAvailable” – the bIsAvailable variable will contain the result from the Script task to state whether or not the sInputPath file was found.
After the variables have been declared, the variables are passed into the Script Task in SSIS. The sInputPath variable is passed as a ReadOnly into the Script Task whereas the bIsAvailable variable is passed as a ReadWrite variable into the Script Task. The following code takes the input path and checks for the file existence and then sets the Boolean variable accordingly:
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Public Class ScriptMain
Public Sub Main()
Try
'check to see if the file is available and set the boolean variable
Dts.Variables("bIsAvailable").Value = File.Exists(Dts.Variables("sInputPath").Value.ToString)
'return success
Dts.TaskResult = Dts.Results.Success
Catch
'an error occurred, return failure
Dts.TaskResult = Dts.Results.Failure
End Try
End Sub
End Class
After the Script task executes, both Precedence Constraints are evaluated to see if the tasks associated will be executed or not.
The following screen print is the precedence constraint for the Import Data dataflow task:
The Precedence Constraint Editor is setup to not only evaluate the value of the preceding task, which in this case is Success, but also an Expression that must be in the form of a Boolean return. For the example above, the bIsAvailable variable is being evaluated to see if this variable returns True.
Note: this expression is checking for True because it is a Boolean variable check, if we wanted to check to see if a variable returned 3, we would set this expression to read “@[User::nCount]==3” where a package variable of nCount was declared to evaluate.
For the example above, a Logical AND is selected to state that all conditions must be met. The Logical OR is available as well if only one of the conditions must be true.
As illustrated above, taking advantage of Expressions in Precedence Constraints will allow for a pseudo dynamic processing of a SSIS package based upon the different outside variables within a network.
ACQUIRECONNECTION Error after moving SSIS packages from one server to another.
Possible Cause:
Your package is probably using Indirect Configurations to get Connection Strings from a SQL Server Configuration DB or an XML File and the name of this Configuration DB or XML File is fetched from an Environment Variable on your server. You probably created those Environment Variables AFTER you dropped your packages on the new server.
Resolution:
Always create the Environment variables on the new server BEFORE you bring any packages either over the network or from Source Control Repository. The packages do not get mapped to the Environment Variables during Run Time but when you drop it on your server is when they get mapped. If you did this mistake like I did and wondering how to resolve it – simply bring in all those SSIS Packages yet again on to your new box (but do make sure you have your Environment Variables created) and everything should flow smoothly from here onwards
Creating XL Report in the same workbook using SSIS
I can illustrate with a small example I have created. First of all I created a template with 2 sheets Orders and Hist where Orders will have the no.of orders of the employee and Hist has all the employee details with headers. Name this as Template.xls. Create another copy of the template and name it as Template2. Place them in the same location.
So my template looks this way.

In the SSIS package, create a File connection to point to Template.xls.

Create another File connection for Template1.xls.




Create a data flow task for orders to pull the data from Orders table.
Since I would be using the variable vQuery as the source in the dataflow, I have assigned a default statement to the vQuery variable at design time to validate my Source in the dataflow. This variable is over written in the script task where I had given the logic to create filename. Add an Excel Destination. Select Orders$ for the name of the XL sheet.
Add another data flow for Employee details which I named as Hist. Just like orders XL destination, select Hist$ as the name of the XL sheet for Employee details.

Add another File System task after the 2 Dataflow tasks to copy the file Template1.xls to the file we have created before in the script task.

When I executed the package I got results as below.


SharePoint vs PerformancePoint for Performance Management
Here's a very recent, well-organized blog posting I came across that compares SharePoint and PerformancePoint's performance management capabilities. 'Which one should I use?' is a questions I'm sure many of us will face as PerformancePoint gains more acceptance and momentum.








