Nice to Know Tidbits for SSRS 2008 Deployment and SSRS 2008 Upgrades

 

by Shruti Sinha, Project Manager

With SQL Server 2008 finally a real option for organizations, IT departments need to address bottom line tactical operational issues before deployment. Questions they need to address are – will my legacy applications still work as expected or do I need to reinvest and redeploy everything.

The table below answers some of these questions.

Usage Scenario

Support Statement

Applications built for RS 2005, 2005 SP2

Will work

Applications built for RS 2000

RS 2000 SOAP APIs are not supported

RS 2005 RDL, RS 2000 RDL

URL access will work

Can publish

Cannot edit

RS Database hosted in SQL 2005

Will work

RS Database hosted in SQL 2000

Not supported

VS 2005 ReportViewer Control

Supported

SharePoint integration

Supported

SharePoint v2 WebParts

Supported

Similarly when upgrading to SSRS 2008 here are some things to remember

        RS 2008 server WILL support running legacy RDL

       RS 2005 RDL

       RS 2000 RDL

        RS 2008 designer has PARTIAL support for legacy RDL

       Converts legacy RDL to 2008 RDL

       If you don’t convert, you cannot open the RDL in the 2008 designer

       CANNOT save to legacy RDL formats

        Published reports upgrade on the fly

        RS 2008 RDL will NOT work with SQL 2005

        SQL 2005 Report Engine is deprecated

       Snapshots created in 2000 or 2005 WORK in SQL 2008

       They will not work in a version after 2008

Supplementing your Reporting Services and/or PerformancePoint Server reports with SQL Server 2005 Data Mining

 

By Joe Toscano, Senior Software Engineer

During the last decade large volumes of data have been accumulated and stored in databases. The result of which have made many organizations data-rich but knowledge-poor. You may already be in living in this scenario. Perhaps you currently are generating insightful reports using tools such as Reporting Services and/or PerformancePoint Server. So, where would SQL Server 2005 Data Mining fit in and what value could it add? The goal of this blog entry is to help answer those questions and then to provide a direction for those who like what they hear and wish to dig deeper.

What does Data Mining Promise?
Report viewers or decision makers can develop hypothesis by drilling through the data and digging for cause-and-effect relationships. Wouldn’t be nice if you had a tool that could determine relationships for you? How about predicting future events and spotting bad data and allowing for the analysis of data in ways that have never been possible? This can be accomplished through the use of data mining. Data Mining can help us determine what products may sell together. What sales were the results of a marketing campaign? What are the odds that certain products may sell? What are the odds that customers may go elsewhere (churn) based on various circumstances? Simply put, you can gain additional business insight that may help you make crucial business decisions and perhaps gain a competitive advantage. Data mining contains technologies that may help you in your ability to retain existing customers and acquire new ones by turning your wealth of data into actionable information.

There’s an excellent start-point white paper that introduces SQL Server 2005 Data Mining and provides a great overview of what it can deliver. This paper can be found at the following link:

http://msdn.microsoft.com/en-us/library/ms345131(SQL.90).aspx

Would like to know more?
If you like what you’ve read so far, there are numerous tutorials that walk you through the initial SQL Server Analysis Services setup and then walk you through the creation of a targeting email campaign scenario, the building a forecasting scenario, the building a Market Basket Scenario and finally the building a Sequence Clustering Scenario. You will use the Business Intelligence Development Studio (BIDS) to create a new data mining solution. These excellent tutorials use the AdventureWorksDW data warehouse and can be found at the following MSND Site:

http://msdn.microsoft.com/en-us/library/ms167488(SQL.90).aspx

These represent a great second step after you digest the introductory white paper referenced above. Just one thing to keep in mind – in order to setup and run through the tutorials you will have to install the SQL Server 2005 Datamining Viewer Controls and the Microsoft SQL Server 2005 Analysis Services 9.0 OLE DB Provider. Both of these are free downloads and are present in the SQL Server 2005 Feature Pack.

A quick look at one tutorial
If you are still not sure you are ready to dive into the tutorials we can take a quick peek at portions of the first and sections of the tutorial. Remember, the finished product is a Business Intelligence Development Studio solution. When you create an Analysis Services project AND have satisfied the Data Mining requirements I mentioned above, you will notice a Mining Structures folder as seen under Solution Explorer below:

clip_image002

One algorithm we are exposed to in the tutorial is The Microsoft Decision Tree. This algorithm calculates the odds of an outcome based on attribute values. For example – what are the chances that a person will purchase a bike based on the number of cars owned. As you would expect the person who is most likely to purchase a bike currently has 0 cars as seen below:

clip_image004

Below is an overview of several other Microsoft Data Mining Algorithms:

Decision Trees
This algorithm calculates the odds of an outcome based on attribute values. For example – what are the chances that a high schooler will attend college given parental encouragement, their gender, their parents income level, and so on.

Naïve Bayes
The Naïve Bayes algorithm is used to clearly show the differences in a particular variable for various data elements. For example, let’s assume we offer a course on Data Mining and wish to track the course evaluations. Which variable or question asked in the evaluation form can most effectively be used as a predictor of future courseware purchasing? Personally, I’ve always suspected that the ‘Would you recommend this course to a friend or co-worker’ question to be a possible indicator of return visit to the classroom. This algorithm can be used to validate my suspicions and excels at showing the differences between certain groups students who CHURN (jump ship to a competitor) and those who don’t.

Sequence Clustering
The clustering algorithm is used to group or cluster data based on a sequence of prior events. For example – users of a web application can often follow a variety of paths through a site. This algorithm can be used to group customers based on their sequence of pages through the site to help determine if some paths are more profitable than others. This is an algorithm that many other data mining vendors cannot deliver.

More information you can be found at the following sites:

http://www.sqlserverdatamining.com/ssdm/

How to gain storage and then performance from data type

During the design phase of a database the choice of the proper data type can have a sizable impact on data base size then on overall performance.

1- Datetime

You may want to choose smalldatetime over datetime for the following reasons:
smalldatetime data type uses half the space (4 bytes) of datetime data type(8 bytes)
datetime stores up to the milliseconds and smalldatetime stores up to the minute,
Use smalldatetime if you need to store date from January 1, 1900 to June 6, 2079; datetime stores date from January 1, 1753, to December 31, 9999

2- Integer

Most of the time developer chose bigint over int as data type for identity key on transaction detail table.

bigint is probably the single most misused data type.
Here are some reasons why to use int over bigint:
Int uses half the space (4 bytes) of bigint (8 bytes); in a 100 million row table you could save about 400MB worth of space by choosing int instead of bigint.
Int can store values from -2147483648 through 2.147.483.647 and bigint can store values from -9223372036854775808 through 9223372036854775807; a huge number.

So it’s clear you should go for bigint only if you need to store values over 2.1 billion; most systems will not grow to this number.

3- numeric/money

It is highly advised to choose money type for currency instead of numeric or decimal. The default precision (number to the left of the decimal point) for both numeric and decimal is 18 not far to the maximum of 28 digits; the default scale (number to the right of the decimal point) is 0.

More than often developers leave untouched the default precision value of 18, higher that the maximum of 15 digits for the money data type. Money data type size is 8 bytes with a precision of 15 and scale of 4 digits while decimal and numeric data type use from 2 to 17 bytes.

4- Character, Text

It is recommended to use char data type over varchar data type in very precise circumstances: data in column must have the same size and contain no null values. Char data type size on disk is fixed, the space for null values remain allocated and smaller string occupied the defined size. varchar data type size on disk is variable and string data of n characters is stored in n bytes, giving a better handling of data size on disk.

Unicode data types nchar and nvarchar must be chosen carefully as they use twice as much space as the non-Unicode data type char and varchar. While char and varchar size is up to 8000 bytes nchar and nvarchar store only up to 4000 bytes. This means you will potentially store the same data on twice the space up to half the capacity.

It is highly advised to use Unicode data type only if Unicode data will be stored in the system. Please note that the proper database collation must be selected to match the Unicode set to avoid string manipulation mal function that may occur.


How is all this space saving impact the overall performance?

A good choice of data type will allow data to be stored in smaller space in the database, this will account for smaller result set and better performance in queries.

Using SSIS to Transfer Data from a SSAS Cube

Scenario
A process needs to be built that will take the aggregate data from your SSAS cube and transfer that data as a source for a third party application. This process will need to be scheduled and have the capability to execute when requested.

Solution
Using SSIS to transfer the data is a great tool to meet the requirements above since you can build the process and schedule it as required as well as run the package when requested.

Since the tool has been established, the question now becomes how to get SSIS to connect to the cube and then work with the results.

The following document lays out a step by step approach to get the initial connection to the SSAS cube, sample MDX to query the cube and the cleanup steps within SSIS to perform against the data to have it usable going forward.

Data Connection
The data connector to setup would be an ADO.NET connection object to the SSAS database. Using OLEDB connector runs into repeated warning signs and I have experienced will actually not retrieve the data and instead either error out or just hang altogether.

However, for the same connection credentials, the ADO.NET connection will connect to the SSAS cube and allow for retrieval of the data.

Data Source
The DataReader data source object is used to pull the data from the SSAS cube using the ADO.NET connection. The property within the DataReader to note is the SQL Command on the Component Properties tab. The SQL Command will be a MDX query to retrieve the data.

A sample MDX query to retrieve data from two dimensions (product and region) across a measure group could be the following:
SELECT [MEASURES].[RETAIL SALES AMOUNT] ON COLUMNS,
NONEMPTY(CROSSJOIN ([DIM PRODUCT].[PRODUCT CATEGORY].CHILDREN, [DIM REGION].[REGION NAME].CHILDREN)) ON ROWS
FROM [SSAS CUBE]

As a result of this MDX query, when you look at the Metadata of the DataReader, you will see the following names for each of the columns being returned:
[[MEASURES].[RETAIL SALES AMOUNT]]
[[DIM PRODUCT].[PRODUCT CATEGORY].[PRODUCT CATEGORY].[MEMBER CAPTION]]
[[DIM REGION].[REGION NAME].[REGION NAME].[MEMBER CAPTION]]

Also, each column will be of type DT_NTEXT, which can be an issue when the third party tool is a destination table and you start receiving data type matching errors.

Clean Up Process
To solve the Data Type issue, two methods can be used.


One can use a Data Conversion transformation to change the data type. However, the data will not convert from DT_NTEXT to DT_STR or DT_NUMERIC as you would hope. Instead, convert the field to a DT_WSTR field for each of the fields and then you can add another conversion transformation to translate from DT_WSTR to DT_STR or DT_NUMERIC.

The problem with this method is that nine columns are created from the original three since each column is converted and a different output name is given to the conversion column. Unfortunately, the Data Conversion task will not replace the original column field.

The second method involves using the Copy Column to get an appropriately named field and then use the Derived Column task to perform the CAST in one step from DT_NTEXT to DT_WSTR to DT_STR, which would look similar to the following:
(DT_STR, 50, 1252)(DT_WSTR, 50)[Region Name] where Region Name is the Copy Column output name you gave in the previous Copy Column step.

The question then becomes, why is the Copy Column needed? Unfortunately, the Derived Column task will not take the [[DIM REGION].[REGION NAME].[REGION NAME].[MEMBER CAPTION]] output name as part of the expression, so using the Copy Column transformation allows for creating a new column with a name that the Derived Column transformation can handle. This still leads to doubling up the column list but you are saving creating one extra group of columns to work with.

Conclusion
At this point, the package has successfully extracted the data from the cube, assigned an appropriate name for each of the columns and setup a correct data type for each field in the result. The package can now process further with the data as stated within the requirement.

Replication essentials

Replication is the copy and distribution of data and database objects from one database to another with the goal of maintaining synchronization between the two databases.

The origin database is call publisher and the destination database the subscriber. Each database has an agent (SQL Server agent) responsible of capturing the change from publisher and applying the change onto the subscriber.

In a distribution implementation you will have to choose between a pull subscription and a push subscription. In a push subscription, publisher agent carries the load of moving data over to the subscriber, in a pull subscription the subscriber agent actually is in charge of reading data from the publisher.

Sql server allows implementation of 3 methods of replications:

Snapshot replication

In a snapshot replication format, a copy of the publisher is send to the subscriber from time to time; the exact picture of data or database object will be copy in bulk over to the subscriber.

This method is recommended only if the destination database functionalities requirements include a reasonable degree of latency, the database receiving data will be in delay synchronization with the origin database.

Snapshot replication is the most commonly implemented method of replication; it is a good fit for low bandwidth network or low activity network.

Transactional replication

The term transactional here refers to the fact that SQL Server captures and applies changes using DML operations.

All committed changes are called transaction; the difference with snapshot replication here is the fact that changes are send to the subscriber as they occur.
It is possible to setup the publisher to cache changes and send them at a certain frequency or transmit all changes in real time. During the time interval transaction are stored in log files before transmission to the subscriber.

The initial step in a transactional replication is a snapshot replication where a copy of the publisher is send to the subscriber.

Transactional replication required a high bandwidth and reliable network connection; in case of delay in transmission, transactions are cached in transaction log before transmission. Depending of the size of data to replicate, transaction log can grow quickly.

Transaction replication is recommended for data ware house and reporting databases where less transaction occurs and it is recommended to disable replication before massive load.

Merge replication

Merge replication is designed for systems where servers must have the same data at defined intervals. Think of this as a both ways differential backup where changes from all databases must be mutually applied; data are move both ways.

This method of replication obviously present risk of conflicts and data inconsistencies; it is the most difficult to implement and manage. Implementation includes creation of additional systems table and required adding new column to identify server to replicated tables.

Replication agent is here call merge agent and is equally responsible for carrying the replication workload.

Update for SQL Server 2008

In SQL Server 2008, you can also synchronize databases by using Microsoft Sync Framework and Sync Services for ADO.NET

Deadlock in OpenXML When Called Twice Within the Same Transaction in each of two Concurrent Instances of a Stored Procedure

THE PROBLEM

We encountered a deadlock in a stored procedure that occurs when two instances of the stored procedure run at the same time in different threads. The stored procedure itself is a fairly simple operation that takes an XML document as a parameter and uses OpenXML to extract two hierarchical levels of content from the XML document. The procedure performs the following tasks in order:

  1. Opens a transaction;
  2. Extracts Order information from the XML document using OpenXml and INSERTS a row into the Order table;
  3. Uses SCOPE_IDENTITY to retrieve the new Primary Key from the insert as @OrderID;
  4. Extracts Order Item information from the XML document in a second OpenXML call and INSERTS 0 or more rows into the OrderItem table.
  5. Commits the transaction.

The procedure deadlocks with another instance of itself in Step 4 if two calls are made at the same time from two different threads (i.e. the calling Web Service receives two orders from two users at the same time.)

Using SQL Server Profiler, we have observed that both procedures make it past step 3 and into step 4. There is a foreign key constraint between Order.OrderID (PK) and OrderLine.OrderID (FK). The execution plan suggests that a Clustered Index INSERT on the child table and a Clustered Index SEEK on the parent table at the same time. By design, SQL Server takes a shared lock on the Order table and an exclusive lock on the OrderItem table, not exactly the recipe for a deadlock situation. Yet there it was, time after time after time.

THE SOLUTION

The SQL statement that deadlocked was a standard INSERT INTO dbo.OrderItem (OrderID, (columns)) SELECT (@orderID, (columnList)) FROM OpenXML WITH (ColumnList) where @OrderID is the new OrderID from Step 3 and (ColumnList) is the result set of order items from the XML document. To solve the problem, we moved access to the OpenXml document outside of the transaction and stored its result set in a table variable. We did this for both OpenXml accesses (i.e. Step 2 and Step 4). The new step order for the procedure became:

  1. Create two table variables: @OrderTable (OrderColumns) and @OrderItem (OrderItemColumns)
  2. INSERT INTO @OrderTable SELECT ((orderColumns) FROM OpenXML WITH (OrderColumns)
  3. INSERT INTO @OrderItemTable SELECT ((orderItemColumns) FROM OpenXML WITH (OrderItemColumns)
  4. BEGIN TRANSACTION
  5. INSERT INTO dbo.Order(orderColumns) SELECT (orderColumns) FROM @OrderTable
  6. Use SCOPE_IDENTITY to retrieve the new Primary Key from the insert as @OrderID
  7. INSERT INTO dbo.OrderItem(@OrderID, orderItemColumns) SELECT (orderItemColumns) FROM @OrderItemTable
  8. COMMIT TRANSACTION.

The OpenXML calls were moved outside of the Transaction scope. This limited the scope of any shared locks on each OpenXml instance to the queries that accessed them.

So WHY Did This Fix the Problem?

We're not quite sure. A search of Microsoft Technet did not reveal any articles that described the specific problem. However, it feels as if the two concurrent procedure calls, each running in its own process and accessing its own instance of OpenXML twice from within its own transaction, created a thread deadlock inside the OpenXML component.

In closing, we add that while OpenXML is still supported, Microsoft now considers XQuery to be the preferred method to do what has traditionally been done using OpenXML. We'll have more on XQuery as a replacement for OpenXML in a future post. Stay Tuned.

Trend Analysis Charts with SSAS and PerformancePoint Server 2007

Creating mining models and writing prediction queries has never been easier. In fact, use the Trend Analysis Report type in PerformancePoint Server 2007 and much of the work will be done for you. Below is a brief walkthrough of how to create a simple report to forecast future data points based on prior data points, and a look at what PerformancePoint is doing behind the scenes to serve up its predictions.

First of all...

Download the SQL Server Samples from CodePlex. The AdventureWorks sample solution located at: C:\Program Files\Microsoft SQL Server\90\Tools\Samples\AdventureWorks Analysis Services Project\Standard\Adventure Works.sln. Deploy the project and process the database.

Creating the report

We'll create a basic report to predict Q1 2003 Revenue based on 2002 Revenue:

Open Dashboard Designer. From the "Workspace Browser" pane, add a new Data Source, pointing it to the "Adventure Works DW Standard Edition" database. Note that the AS database will not appear in the drop-down list unless you've added the account of the logged-in user to a Role that has access to the database.

The Trend Analysis Report type must be based on a Scorecard, so we'll import an existing one from the AdventureWorks Analysis Services sample database.

Create a new Scorecard from the Workspace Browser. When the wizard appears...

  • Accept the defaults on the first screen of the wizard.
  • On the next screen, name the Scorecard "RevenueCard"
  • Next screen, select the "AdventureWorksDW" data source you created.
  • Next screen, choose "Import SQL Server Analysis Services KPIs" option.
  • Next screen, check the "Revenue" KPI.
  • Next screen, do nothing
  • Next screen, choose "Date.Date.Fiscal" as the dimension. For the members, expand the nodes and check "Q1 FY 2002", "Q2 FY 2002", "Q3 FY 2002", "Q4 FY 2002".

Now create a new report from Workspace Browser. When the wizard appears, choose "Trend Analysis Chart"

  • Name the report "RevenueTrend"
  • Next screen, select the "RevenueCard" Scorecard. If it doesn't show up, cancel the wizard and click "Publish All" from the "Home" ribbon.
  • Complete the wizard, accepting all defaults.

Forecasting

Open the "RevenueTrend" report and type a number into the "Forecast Period" textbox. Note that the predicted data point for "Q1 FY 2003" is highlighted in red.

Making it pretty

The report looks busy, so we need to remove the unwanted series. Right-click the chart and make sure "Toolbar" is checked. Now from the toolbar, click "Field List". Delete the "Goal and Status" and "Trend" totals from the field list.

The result should look something like this:

image

Under the covers

Now let's examine a SQL Profiler trace to see how PerformancePoint is querying Analysis Services to serve up the forecasted data. To get good trace data, start the trace, then switch over to PerformancePoint and change the value in the "Forecast Period" textbox to 2. The below is distilled from the trace results:

First, a session (temporary) mining model is created. Note that the Microsoft Time Series Data Mining Algorithm is used, because the forecast is based on a date dimension.

CREATE SESSION MINING MODEL MiningModel_0( [KeyTime] LONG KEY TIME ,[Key] LONG KEY ,[Column_2] DOUBLE CONTINUOUS  PREDICT_ONLY) USING Microsoft_Time_Series

Then the mining model is trained with data.  Note the @InputRowset parameter (will explain below):

INSERT INTO MiningModel_0 (  [KeyTime] ,[Key] ,[Column_2]) @InputRowset

Finally, a prediction query returns the results.  Note the "number of next steps to predict" parameter is set to 1, because we chose to forecast 1 period (Q1 FY 2003):

SELECT FLATTENED (SELECT $TIME, [Column_2] from PredictTimeSeries([Column_2], 1) ) FROM [MiningModel_0]

The @InputRowset parameter above is an in-memory table created by PerformancePoint to train the mining model.  For a good explanation of this technique, see this post by Microsoft's SQL Server Data Mining development team. From the trace, I suspect the below query is used to help build @InputRowset.



WITH 
MEMBER [Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_836cc47f-a089-4156-8347-81ee360f6acf] as '(KPIGoal("Revenue"))'
MEMBER [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_836cc47f-a089-4156-8347-81ee360f6acf] as '([Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_836cc47f-a089-4156-8347-81ee360f6acf])'
MEMBER [Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_5047faae-8899-41d1-946e-3ce4abc4274d] as '(KPIStatus("Revenue"))'
MEMBER [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_5047faae-8899-41d1-946e-3ce4abc4274d] as '([Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_5047faae-8899-41d1-946e-3ce4abc4274d])'
MEMBER [Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_46a8b718-4377-4414-a7a3-0b85c838c17c] as '(KPITrend("Revenue"))'
MEMBER [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_46a8b718-4377-4414-a7a3-0b85c838c17c] as '([Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_46a8b718-4377-4414-a7a3-0b85c838c17c])'
MEMBER [Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_48a18387-c463-4e52-b324-4df046dbb8ff] as '(KPITrend("Revenue"))'
MEMBER [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_48a18387-c463-4e52-b324-4df046dbb8ff] as '([Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_48a18387-c463-4e52-b324-4df046dbb8ff])'
MEMBER [Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_1c950f07-5d39-4d21-8339-440227113f7a] as '(KPIValue("Revenue"))'
MEMBER [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_1c950f07-5d39-4d21-8339-440227113f7a] as '([Measures].[CUSTOM_eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_1c950f07-5d39-4d21-8339-440227113f7a])'
SELECT { [Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_836cc47f-a089-4156-8347-81ee360f6acf],
[Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_5047faae-8899-41d1-946e-3ce4abc4274d],
[Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_46a8b718-4377-4414-a7a3-0b85c838c17c],
[Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_48a18387-c463-4e52-b324-4df046dbb8ff],
[Measures].[eca3a6bc-0bc9-446a-9e2b-da5f05a31e53_1c950f07-5d39-4d21-8339-440227113f7a]} ON AXIS(0),
{([Date].[Fiscal].[Fiscal Quarter].&[2002]&[1]),
([Date].[Fiscal].[Fiscal Quarter].&[2002]&[2]),
([Date].[Fiscal].[Fiscal Quarter].&[2002]&[3]),
([Date].[Fiscal].[Fiscal Quarter].&[2002]&[4]),
([Date].[Fiscal].[Fiscal Quarter].&[2003]&[1])} ON AXIS(1)
FROM [Adventure Works]

Conclusions

PerformancePoint allows you to easily create trend reports from existing KPIs. Using Dashboard Designer, even business users can generate forecast reports with no knowledge of DMX.

There are several features I didn't cover, such as the "Advanced Settings" screen that allows you to specify values for a few mining model parameters (PERIODICITY_HINT, COMPLEXITY_PENALTY, MINIMUM_SUPPORT). Aside from these settings, Dashboard Designer doesn't give you much control of the DMX that drives the report. As a last resort, the entire query could be coded by hand, but hopefully you won't have to go down that road.

Subscription is Disabled When User!UserID Global Variable is Used in SSRS Report

I recently had a problem where I was unable to create a data-driven subscription for a SQL Server Reporting Services report.  I opened SQL Server Management Studio, connected to Reporting Services, navigated to a report, then right clicked on it's Subscriptions node and both the New Subscription and New Data-Driven Subscription menu options were grayed out.

NoSubscriptions

The report in question had five parameters that the users could enter at runtime.  I wanted to implement the data-driven subscription to execute the report with all possible combinations of the parameters in order to get every possible version of the report to be cached.  Since the data for the report was only refreshed once a week, this seemed like a good idea.

I eventually found the answer in Books On Line.  Subscriptions are disabled when you use the User!UserID global variable.  The User!UserID global variable contains the domain\username of the user running the report.  In my case I used the User!UserID global variable to filter the various dropdown lists for parameters so users could only run the report for specific regions and/or offices that they are allowed to see.

Using data label expressions to create SSRS 2005 Pie Charts with intuitive and helpful pie slice

by Joe Toscano, Senior Software Engineer, RDA Corporation

If you’ve created PIE charts in SQL Server Reporting Services you’ve noticed that you are able to drag and drop category, series and data fields. The setup is pretty straightforward, but by working a bit with the data label values you are able to display more than a single column value in each slice of the pie. This may be helpful in allowing your reports more contain more useful information.


In our example we are looking at the total sales for bike products from the AdventureWorks database. We are looking at 3 categories: Touring Bikes, Road Bikes and Mountain Bikes. Each bike category has both a Total Sales and a Percentage of Total Sales that is part of the reports data set. Most pie chart reports display a single value present in each slice of the pie; however, in our report we are displaying both values. Below is a snapshot of this sample report:

















How did we improve our pie slice labels?

This blog entry focuses on how we are able to display both the total sales and the percentage of total in each pie slice. The answer lies in modifying the data label for the Total Sales column that we dropped to the Data Field area while in Report Layout / Design Mode.

The screen snapshot below was taken from the Report Design / Report Layout mode. Notice that we drug the LineItemTotal column to the Data Field area. Once there, we right-clicked on the data label and chose properties.









Let’s focus on the Point Labels tab. Notice the Data Label dropdown and the little “Fx” to its right. This tells us that this value can be much more than a simple column in your dataset. The Data Label value can actually be the results of an expression! In our case, we used the several built-in functions to convert data types of more than a single column, strip out trailing spaces and concatenate formatted strings. Below is the expression that was used in our Data label:

cstr(left(Fields!LineItemTotal.Value / 1000000, 5)) & vbcrlf & " (" & cstr(left(Fields!PercentageOfTotal.Value,5)) & "%)"

In summary, by working with the Data Label expressions, your pie slices can contain much more information that a single column value. The only limit is really the size of each pie slice, but as was done with our report you are able to specify that the slice values be placed on the outside of the pie.

Tips to get optimum performance from large dataset manipulation

Here is a non exhaustive list of ways to get the best performance from large dataset DML.
Depending on system configuration and design, the effect of these recommendations may vary.

1-Do not create more than 16 exact match joins.

2- make sure indexes are created on exact match columns only if necessary for reporting

3- Avoid join on large views use temporary tables instead; Proceed by loading data from the view into a temp table then use the temp table in the join instead.

4- Avoid as much as possible creating indexes on temporary tables as it will impact insert performance

5- Avoid grouping as much as possible, if necessary make sure that the non- aggregated columns actually regroup data.

Most of the times as developement advances more columns are added onto the query. At some point the columns in the 'group by' clause may become candidate key to the dataset causing grouping to be ineffective; on a large query the effect on performance can be very important.

--The classics

6- Avoid table scans

7- Avoid nested Sql queries; join are more effective.

8- Avoid cursors. An effective use of Case statement can replace the use of cursors and deliver much better performance.
This sample query is for illustration only:

SELECT VENDORID, PRODUCTID,

SUM(CASE WHEN MONTH(SALESDATE)=1
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS JAN_SALES,

SUM(CASE WHEN MONTH(SALESDATE)=2
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS FEB_SALES,

SUM(CASE WHEN MONTH(SALESDATE)=3
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS MAR_SALES

--...

FROM SALES S (NOLOCK)
JOIN PRODUCT P (NOLOCK) ON P.PRODUCTID=S.PRODUCTID
JOIN VENDOR V (NOLOCK) ON V.VENDORID=P.VENDORID
WHERE SALESDATE BETWEEN @PERIODSTART AND @PERIODEND
GROUP BY V.VENDORID,P.PRODUCTID

How to Find the Worst-Performing Queries in a Database Server

A common complaint among database users is the sporadic occurrence of errors like this:

Timeout expired. The timeout period elapsed
prior to completion of the operation
or the server is not responding.
Exception type: SqlException
Source: .Net SqlClient Data Provider

The errors may come from a variety of procedures or queries. While occasionally the query itself may need to be optimized, the problem may stem from another unrelated operation that is bogging the system down. At that point the problem becomes a general tuning issue that may involve queries that have nothing to do with the “offending” query.

There is a simple way to gain insight into which query or queries deserve looking at. Using SQL Profiler, you can develop a list of queries, sorted by average reads descending. This result set will become your “hit list” of query optimizations that will yield the greatest return on time invested.

WHY QUERIES DO EXCESSIVE READS

(1) There is no index available to support a seek-based lookup. This may occur when a query has no WHERE CLAUSE. Your best defense here is to avoid this situation altogether. If you must query without a WHERE CLAUSE, keep the result set very narrow and select it as infrequently as possible.

(2) There is no index that supports the WHERE CLAUSE. A WHERE CLAUSE across joined data (i.e. multiple filter expressions on different source tables) may be particularly problematic, as it is usually not possible for the query optimizer to figure out which index to use on the clause. You may improve performance in this case by joining two subqueries, each on its own table with its own WHERE CLAUSE.

(3) The WHERE CLAUSE is applied to a view. Avoid using views in queries that will be subsequently filtered. Selecting even a single row, based on its primary key, from a view will cause a scan of the underlying table.

Note that INSERT and UPDATE queries are not immune to these problems, as many of these use filters and embedded joins to do their work.

HOW TO USE SQL PROFILER TO FIND THE WORST OFFENDERS

All queries in a database can be optimized, and optimizing most queries is a waste of time and effort. This simple procedure will enable you to identify the worst offenders in your server and target your efforts.

(1) Start SQL Profiler. Open a new trace on the server that you wish to observe.

(2) In the Trace Properties screen (Events Selection tab), check “Show All Events.” Find the “Stored Procedures” events and choose the “SP:StmtCompleted” event. This will enable queries within a stored procedure to be separately reported.

(3) The default set of Column Filters will work well, so no changes are needed there. You can, however, keep the trace volume down by filtering reads for value >= 100. Queries with reads < 100 are not of interest to us in this context.

(4) In the Trace Properties screen (General tab), check “Save To Table.” Connect to the database where the trace will be saved (probably on your own computer so that you can keep it as evidence), specify a database, an owner, and a table. The database and owner should exist. The table should NOT exist. Press RUN to activate the trace. This will cause the table to be created (or overwritten if it already exists).

(5) As the trace runs, put the server through its paces. The idea is to create the environment where the problem manifests itself. You can concurrently use Management Studio to query the status of the trace table. Assuming you called the table Trace1, this query will get you started.

SELECT TOP 10 CAST(TextData AS Varchar(255))AS Query
,Count(*) AS TimesCalled
, AVG(READS) AS MeanReads
FROM [dbo].[Trace1] WITH (NOLOCK)
GROUP BY CAST(TextData AS Varchar(255))
ORDER BY MEANREADS DESC

The result set will contain three columns. The Query column will produce recognizable SQL statements from within stored procedures. The TimesCalled column will reveal the prevalence of the call. And the MeanReads column will reveal the average reads done by the query each time it is called. This result set will produce powerful evidence of where the problems are in a database server, enabling you to clean up your own mess(es) before you escalate the issue within the department.

(6) As you repeat the test, use a different table name each time. This will allow you to chart progress as you fix stored procedures.

Cannot Access Web Services/Report Manager in SQL Server 2008

Problem
After installing SQL Server 2008, when I went to access the Web Service URL @ http://localserver/ReportServer, it said “Page cannot be found”. After going through several rounds of check to see if the installation went correctly, I did not find anything abnormal. So what was wrong?
Solution:
Change the port on which Report Server listens from 80 (which is the default) to something different (say 8080). You do this by going to the Reporting Services Configuration Manager -->Web Service URL. Change the TCP Port from 80 to 8080 and click Apply. Go to Report Manager URL --> Advanced --> Edit button (of Multiple Identities for Report Manager) and then change the TCP port to 8080. (Make sure you change this to the same as above for consistency sake). Click OK. Click OK again.
Now you can access Web Service by typing http://localserver:8080/ReportServer and you can access ReportManager URL by typing http://localserver:8080/Reports