Monday, April 19, 2010

Right Answer Wrong Question Scenarios

As I mentioned in an earlier post, one way that we provide intelligence to the data, resulting in information, is to prevent Right Answer Wrong Question (RAWQ) scenarios.  A RAWQ scenario occurs when data is returned which seems sane on first glance but upon inspection it is found that the numbers don't match reality.  This happens in SQL with improper joins but is typically caught quickly by the developer.  However, when you expose your data for the public to consume in cubes, RAWQ happens altogether too frequently.

This is best shown via an example.  Consider a Public Libary cube which has three date dimensions, Date Checked In, Date Checked Out, and Date Held.  Similarly, it has three people dimensions, one for Held By, Checked Out By, and Checked In By.  (Ok, so the library doesn't know who checks the book in.  Go with it. ;-)  A typical MDX query might look like this:

SELECT 
{ 
   [Measures].[Count Books] 
} ON COLUMNS, 
NON EMPTY { 
   [Checked Out By].[Checked Out By].Members 
} ON ROWS 
FROM 
   [Books] 
WHERE 
( 
   [Date Held].[Date].&[2010-04-19T00:00:00] 
) 

Do you see the wrong question?  You will get an answer and it won't be the wrong one but you asked the cube to show you a count of books checked out by all patrons which was held on April 19th, 2010.  It's a good question, even a potentially useful one, but it would be rare that the library needs to know this information.  More likely they'd want to know either who held how many books on a date or who checked out how many books on a date.

We implemented a solution to this by creating Perspectives for cubes where such RAWQ scenarios exist.  Doing so where each perspective has matching dimensions, i.e. a [Checked Out Books] perspective with the  [Checked Out By] dimension matching the [Date Checked Out] dimension, while hiding the [Books] cube from public consuption.  Only developers have access to the "raw" [Books] cube.

One method to get a combined dataset of OLAP data

When I started with this team, about 3 years ago, they had just completed a business intelligence solution that allowed any (internal) user with a web browser to pull data from several SQL Server Analysis Services (SSAS) cubes.  As I mentioned in my introductory post, it seems that most BI is done in Excel one cube at a time.  Our STAR Award-winning solution, while still representing only one cube at a time, did not have this limitation.  Further, it has allowed us to provide views into the data which help to prevent Right Answer Wrong Question scenarios.

Once our users got used to having such easy access to their data, they suddenly wanted to mix up the data in the different cubes to see new, potential relationships.  They were already receiving some of this data in various reports and some rolled up in a summary report, all created using SQL.  However, if the SQL developer didn't use the appropriate joins and where clauses to match the filters and slicers that the user selected on our BI solution, the numbers wouldn't match up.  Further, the old SQL reports ran much slower than the BI solution.  Thus, it was decided that we could get speedier, more consistent results if we started using the OLAP cubes for our reports.  (Doing so sealed the deal on the STAR Award.)  However, doing so is not so easy in Reporting Services.  You have to get each cube's data separately and representing them on the report can get messy and is very design intensive due to the usage of several stacked tables.  It would be really nice if we could use one matrix (we were on SQL Server 2005 at the time) to dump all of the different cube's data, with the only commonality being that they all represented the same date range.

My coworker investigated and we decided to implement a linked server in SQL to SSAS.  With this, we could create a stored procedure that collated MDX query results against multiple cubes, delivering a table to the SSRS report and speeding up the resulting report.

This has functioned well for us for 2+ years.  However, recently we have been creating web applications, where query time is quickly noticed by the end-user, and have found that this has added overhead to the queries which we want to eliminate.  More on that in a future post.

Introduction of this blog and its author

If you've found this post, then you've arrived early to the action. My goal with this blog is to introduce myself and share what I've learned most recently in my growth as a Business Intelligence developer.

A bit about myself: I have worked in the IT industry, mainly in the support arena, for over 20 years. My first PC was a brand new Timex/Sinclair 1000 which my father and I modified to have an external keyboard. I fell in love with programming then when I found a listing which showed how to poke values into memory spaces. Doing so would cause interference patterns on a TV with it's stationed tuned slightly out of phase (now that ages me) resulting in it playing the Star Spangled Banner, quite a feat for a system that had no sound functionality.

I began my professional career in application and hardware support implementing CADD systems. I moved into desktop support, then network administration, and back to application support on Windows Servers. My education introduced me to several programming languages, including Basic, Fortran, Pascal, C++, and Java. I was also exposed to dBase IV, Microsoft Access, Microsoft SQL Server, and Oracle in the professional realm. I decided early on that, though I didn't have the formal education to be a developer, that's where I wanted to be. Thankfully, I found an opportunity to pursue that about 3 years ago and haven't looked back since.

Today, I work in a business analysis group for one of the largest security software companies in the world. Our focus in this group is to provide business intelligence to all levels of management in the software support organization. The tools we use include Microsoft SQL Servers 2005 & 2008, Reporting Services (SSRS), Analysis Services (SSAS), and ASP.Net 3.5. My focus is on utilizing SSAS as a platform for consistent reporting of metrics utilized to meet performance goals.

I plan to use this blog as a way to share with the community the things I've learned along the way. I've been finding it difficult to find information on exactly what I've been doing. I may be far off the mark but it seems that the BI community isn't doing quite what my organization is in implementing custom applications presenting information from data held in multiple cubes. That has been a particular challenge in SSRS, which seems designed to best handle one dataset per table/matrix/tablix. I've also found that Microsoft doesn't seem to treat OLAP access in ASP.Net development with the same level of ease as SQL access. Thus, I've had to search out solutions for that, as well.

It's as if the only way Microsoft really wants OLAP data to be available is via Excel, and there is a big difference between data and information. Giving someone a pivot table in Excel, while easy, does not help them make decisions. One must have some specific intelligence to turn that data into information, which is what I and my team strive to do. Perhaps something I post here will help you do the same.