Sunday, June 29, 2008

Ignoring your gut


Plug image by pulpolux.
Sometimes, ignoring your gut might be the thing that you do that requires the most experience and intuition. I find that this is a rule that often applies when trying to solve a software performance problem.

You see, my current project is a business intelligence and reporting product powered by Microsoft Analysis Services. I have spent a while on this project an I am getting pretty good at understanding what happens under the hood of the engine when the application performs queries. I have also spent a lot of time so far tuning the data cube to allow the application to achieve a good level of performance.

Recently, my customer started using the application with real-world data and was puzzled by the response times he was getting from one of the datasets. It was a rather small dataset with nothing out of the ordinary. One view, no calculations. Something really basic. But every time you performed an operation, you had to wait for an unusual 5-7 second delay before getting responses from the database.

I spent a day working with that dataset, massaging it, creating indexes and aggregations. pushing and prodding every which a way. Applying the same recipes that got me some level of success in the past. But nothing happened. I knew that what I was doing was improving things because I could measure the imrovement on the other datasets but this one remained very sluggish.

So I resigned myself to write to my customer with a hypothesis that I had about the reason why this dataset was slow (lots of "zero" values instead of nulls... that was my hypothesis... really lame in retrospect). As I was writing, I spent the time to explain the things that I had tried and, since my customer is smart but he is not necessarily an MDX query expert, I "took id down a notch" on the technical side. Taking care of going through all the details and all the steps. And as I was explaining, It struck me that I didn't really do all that I was writing. I didn't examine the data to see if it was different, I just quickly glanced and the text files containing the raw data and gauged the complexity of the dataset by their number and their sizes.

So I took a step back and I went back to the basics. I carefully examined all the dimensions of my data cube to realize that I had one dimension with a really large number of members. A 100:1 ratio compared to the other datasets that I had imported in the application before. It was so simple. The total "size" of the imported data files were similar but the "shape" and size of the resulting data cube was radically different. The way the queries were constructed made assumptions that were wrong. After that was identified, it was easy to come-up with a plan and test a few queries to validate my findings. All of this would have taken an hour if I had spend the time to go through all the steps.

Today, my experience and my knowledge was my worst enemy and it led me on a wild goose chase. When I started questioning my assumptions, the answer was staring at me right in the face. And yes... sometimes, you have to check if the computer is still plugged in.

0 comments: