Sylvester's Knowledge Base: Efficient Dynamic Query

I believe when every developer wants to create a CRUD screen, every time, we have a requirement that must have a filter function for your data grid or grid view. The filter normally has multiple text boxes that support multiple data filter. And, when a particular text box is left empty, that data should not get filtered from the record set. Well, normally we create dynamic query to handle this situation.

I am not sure how you write your dynamic query, but I want to show you how I did it and how bad it is and see if you agree or not.

Here is my table. Imagine that I have the requirement that want me to create data filter for FirstName, LastName and Hobbies columns only in my table.

Normally what I did is to create the following SQL query:

SELECT *
FROM [QueryPerfTest].[dbo].[Profiles]
WHERE
(@firstName IS NULL OR FirstName LIKE '%' + @firstName + '%') AND
(@lastName IS NULL OR LastName LIKE '%' + @lastName + '%') AND
(@hobbies IS NULL OR Hobbies LIKE '%' + @hobbies + '%')

Above query allow me to pass in NULL value to my input parameter if I want to exclude the data filter for that particular data column. However, the above query look clean and readable, but later I realize it is a time bomb! Thanks to my mentor, and now I show you why.

Look at the execution plan of my query:
In case you do not know what is an execution plan, it is the result of the query optimizer's attempt to calculate the most efficient way to implement the T-SQL query that you executed.
There are 2 ways of getting execution plan. One is to get estimate execution plan, another is to get actual execution plan. Normally both way may return the same result, but sometimes not. In order to get more accurate info, always display actual execution plan but beware it is more expensive in term of system resource compare to get estimate execution plan.
You can get the estimated execution plan by right clicking at the query window, click the "Display Estimated Execution Plan". Or, if you want to display the actual result and execution plan together, just right click at the query window, click the "Include Actual Execution Plan", then hit F5 to execute your query.

Following is my actual execution plan of my query:

I have 1 million rows of record in my table, and my above query would only return 1 row of record. Base on the execution plan, it actually perform index scan (it is normal when query non index column and using LIKE operator), and when you mouse over the execution plan result, you get the yellow tooltips.

The tooltips show the detail of the operations during the query execution. Basically the details are mostly already self explained except a few such as:

Estimated Subtree Cost – The total cost to the query optimizer for executing this operation and all operations preceding it in the same subtree. But, it should be read from right to left. This number is meaningless to us. It represents the amount of time that the optimizer thinks this operator will take.
Estimated number of rows – The number of rows produced by the operator.

In short, every numbers show in the tooltips, the lower the better.
More detail about the tooltips - MSDN.

Normally, when we encounter such performance issue, we have few options to enhance it.
Option 1: Create indexes for columns.
Option 2: Change the query.

Let's try option 1 first.

In case you do not know what is an index, imagine a database is like a dictionary, when you want to look for a word in your dictionary, you normally would look for the first character of the word in the index page first, then skip to the page number that contains the first character of the word, then flip page by page to search for the exact word. Same goes to the database, the query optimizer attempt to scan the index then locate your data. By the way, the above explained execution plan is actually related to this example. It show you the actions that how the optimizer are going to do in order to search a word in the dictionary.

There are a few index types available for you to create. Most of the time we create non clustered index for non primary key column because SQL server organize the indexes with B-tree structure. With this structure, clustered indexes have to be unique and in sorted order, and that's the reason primary key column usually is clustered index and you cannot have more than one clustered index in the same table. For non clustered index, it is additional index that is organized base on clustered index.

Theoretically after creating non clustered indexes for the related table columns, it should improve the query performance because there are extra indexes for optimizer to scan and locate index fast and easy.

Let's see after adding non clustered index to the column, would it help or not.

As above screenshot, it has no performance improvement at all, instead it get worse. Why?

Notice that Estimated Number of Rows and Row Size has increased, which mean more data rows scanned. As long as my query contains LIKE operator and contains wildcard at the front, it will always perform index scan. The reason is when there is a wildcard at the front, it means that the optimizer will never know what value is going to be look like at the front. It is just like I give you a task to look for all the words that contain a few characters in the middle, I am sure you will be clueless and you are going to scan the whole dictionary. Furthermore, since I had created extra non clustered indexes for my columns, it got more indexes to scan.

Alright, now if I remove the wildcard at the front, it should perform faster because without the front wildcard, you can easily locate any word that start with a fixed value through the index, and then you can ignore the back wildcard.

If you look at the result above, you will question what the hell has happened, the performance still has no improvement at all?!?!

Well, this is the thing that I want to share today. For easy readability and cleaner look query, it comes with a price. Take note the input parameter IS NULL check. Even though you have created index, remove wild card from the query, the optimizer still perform index scan! You may have the same question I got in my mind before realizing this, you would ask I am checking my input parameter IS NULL but not the table column, it should not related to the table index scan. So, I remove all the IS NULL check from the query, and see what happen next.

You will see that the optimizer perform index seek! Look at the tooltips result, it show a very big significant improvement.

Summary:

- Never create unnecessary index to the table column. It may degrade the performance.
- Try to avoid using LIKE operator or wildcard at the front. Use text search if possible.
- Don't be a lazy @$$ like me, just write extra few lines of code, it pays.
- If you want to create dynamic query, and if you use stored procedure, you can write IF ELSE statement to form the query string that exclude the data filter for null value input parameter. Or, if you use ADO.net SqlCommand, you can form the dynamic query even more easily.

Sylvester's Knowledge Base

Monday, October 8, 2012

Efficient Dynamic Query - Part 1

No comments:

Post a Comment

Send Transactional SMS with API

Search This Blog