Sylvester's Knowledge Base: October 2012

Sunday, October 28, 2012

Workflow Tracking and Tracing

When your workflow service is running, and how to tell if there is an error happened in the workflow instance? Actually, workflow already had a tracking mechanism built with it. We just need to do some configuration to enable it.

In the workflow tracking and tracing, we need to know 3 things:
1. Tracking Record - the tracking info emitted from workflow runtime
2. Tracking Participant - the subscription of the tracking record
3. Tracking Profile - the filter of the subscription

I will skip all the theory and concept detail, feel free to read them from MSDN.
I will share the detail of how to configure it only in this topic.

In a normal event, we would want to track the unusual incident in workflow only. So, I want to track the workflow instance if it is faulted, suspended, terminated or unhandled exception had occurred only. How?

Configure the Tracking Profile
Open up the config file. Insert the following XML node into the <system.serviceModel> node.

<tracking>
<profiles>
<trackingProfile name="TrackEverything" implementationVisibility="All">
<workflow activityDefinitionId="*">
<workflowInstanceQueries>
<workflowInstanceQuery>
<states>
<state name="Faulted"/>
<state name="Suspended"/>
<state name="Terminated"/>
<state name="UnhandledException"/>
</states>
</workflowInstanceQuery>
</workflowInstanceQueries>
</workflow>
</trackingProfile>
</profiles>
</tracking>

Note the child elements in <states> node, there are the statuses that I wish to track. And, I name my Tracking Profile as "TrackEverything". For any other status which you wish to track, you can refer to here: http://msdn.microsoft.com/en-us/library/ee818716.aspx

Configure the Tracking Participant
Create a new service behavior or use back your existing service behavior for workflow service host. Add the ETW (Event Tracking for Windows) tracking participant to it and then set the above tracking profile work with it.

<behavior name="WorkflowServiceBehavior">
<serviceMetadata httpGetEnabled="true"/>
<serviceDebug includeExceptionDetailInFaults="true"/>
<sqlWorkflowInstanceStore connectionStringName="WorkflowInstanceStore"
hostLockRenewalPeriod="00:00:30"
runnableInstancesDetectionPeriod="00:00:05"
instanceCompletionAction="DeleteAll"
instanceLockedExceptionAction="AggressiveRetry"
instanceEncodingOption="GZip"
/>
<dataContractSerializer maxItemsInObjectGraph="2147483647"/>
<etwTracking profileName="TrackEverything"/>
</behavior>

Which Log to Monitor in Event Viewer?
Open up the event viewer. Go to Applications and Services Logs \ Microsoft \ Application Server-Applications. You need to show Analytic and Debug logs. Here is where the ETW put the tracking record.

Now, let's purposely make a fail call to the workflow service. And, here is how the log look like.

If you do not like to have the log put into this default location, there is a way to configure ETW to put all the logs to a specific folder, please refer to MSDN.

If you do not like to read those default logs because they are too messy and the provided information is useless to you, we can actually perform a similar but custom tracking, which mean, we can perform another layer of filtering from the subscribed tracking record. I will share the detail in the next post.

Workflow Correlation

How to make use of workflow correlation token to continue the process flow from an active instance?

Continue from the previous post about SQL persistence, assume now your workflow application is enabled with SQL persistence. The following are the steps to create a sample project that work with workflow instance store.

Create a new database for the sample project. I am going to create a simple leave application, there are only 2 operations in the application which are apply leave then approve leave. So, I have created a table for it. Note that I have a column call "WorkflowCorrelationID", it will be used later to show you how workflow persistence work.

I will skip the details about the common reptitive work such as creating business entities, business layer and data access layer, I use LASG to generate them all. Again, if you do not know what is LASG, refer to my previous post. This tool generates all those code for you. I have generated the sample application code in just few minutes. Thanks to Serena. This is how my solution look like after using LASG to generate all the code from data access layer all the way up to business layer then custom workflow activities and the workflow activity designer. The custom workflow activities are actually Code Activity, and they will appear in the control toolbox after project is built and to be used in the next step to construct the workflow.

Design the workflow and here is how it look like.

Configure the first Receive operation to able to create new instance when it get hit. Provide and standardize the Service Contract Name.

Configure the workflow correlation handler first. In my case is __handle1 which was already created automatically in the variable section.

Select the object property that you wish Workflow handler to use as the key to correlate the workflow instance with application instance. Note: the value must be unique, hence I use GUID. And, note that in my step 6, when I create my table, I have a column call "WorkflowCorrelationID" and here is the purpose of it. Workflow rely this unique token to identify the workflow instances in the persistence store.

Now you would want the instance data get persisted after the Apply Leave operation is completed. So, set the PersistBeforeSend to true in the SendReplyToReceive.

After apply leave, we need to approve leave. Imagine that we have many different people apply leave but they are all pending for approval. There will be many workflow instances are created for each individual leave application. Here is how correlation work and Workflow base on the correlation ID and know which workflow instance to pick and resume the expected operation which is approve leave. So, in the approve leave's Receive operation, we should also set the correlation handler and token same as the apply leave's Receive operation. But this time, I am not going to create new instance when hit this approve leave operation because we should not do that since we already had an instance created when we apply leave, we should reuse back the same instance earlier.

How to expose Workflow as WCF service?

Workflow part is completed. Now, I want to expose my workflow as a WCF service. Here is how my Windows Process Activation Service configuration look like in my web.config.

Note the highlight for the key difference of the configuration between Workflow hosted WCF Service and normal WCF Service.

<serviceHostingEnvironment multipleSiteBindingsEnabled="true">
<serviceActivations>

<add factory="System.ServiceModel.Activities.Activation.WorkflowServiceHostFactory"
relativeAddress="./LeaveWorkflowService.svc" service="PersistenceStoreSample.Workflows.LeaveWorkflowService"/>

<add factory="System.ServiceModel.Activation.ServiceHostFactory"
relativeAddress="./LeaveService.svc" service="PersistenceStoreSample.Services.LeaveService"/>

</serviceActivations>
</serviceHostingEnvironment>

<services>

<service name="LeaveWorkflowService"
behaviorConfiguration="WorkflowServiceBehavior">
<endpoint name="basicHttpLeaveWorkflowService"
address=""
binding="basicHttpBinding"
contract="ILeaveWorkflowService" />
<endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange" />
</service>

<service name="PersistenceStoreSample.Services.LeaveService"
behaviorConfiguration="DefaultServiceBehavior">
<endpoint name="basicHttpLeaveService"
address=""
binding="basicHttpBinding"
contract="PersistenceStoreSample.Services.Contracts.ILeaveService" />
<endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange" />
</service>

</services>

<behaviors>
<serviceBehaviors>

<behavior name="WorkflowServiceBehavior">
<serviceMetadata httpGetEnabled="true"/>
<serviceDebug includeExceptionDetailInFaults="true"/>
<sqlWorkflowInstanceStore connectionStringName="WorkflowInstanceStore"
hostLockRenewalPeriod="00:00:30"
runnableInstancesDetectionPeriod="00:00:05"
instanceCompletionAction="DeleteAll"
instanceLockedExceptionAction="AggressiveRetry"
instanceEncodingOption="GZip"
/>
</behavior>

<behavior name="DefaultServiceBehavior">
<serviceMetadata httpGetEnabled="true"/>
<serviceDebug includeExceptionDetailInFaults="true" />
</behavior>

</serviceBehaviors>
</behaviors>

Note: The service host factory for Workflow is different from WCF, thanks to Serena for pointing it out.
For WCF, System.ServiceModel.Activation.ServiceHostFactory
For Workflow, System.ServiceModel.Activities.Activation.WorkflowServiceHostFactory

How to call the Workflow service?

I have created a unit project to test my workflow.

        [TestMethod]
        public void ApplyLeaveTest()
        {
            //Create new leave application
            Leave leave = new Leave();
            leave.ApplyBy = 1;
            leave.ApplicationDate = DateTime.Now;

            //Generate correlation token first before invoke workflow service
            leave.WorkflowCorrelationID = Guid.NewGuid();

            LeaveWorkflowServiceClient wfLeave = new LeaveWorkflowServiceClient();
            wfLeave.ApplyLeave(leave);
        }

        [TestMethod]
        public void ApproveLeaveTest()
        {
            LeaveServiceClient wcfLeave = new LeaveServiceClient();
            
            //Get existing leave application from DB
            Leave leave = wcfLeave.GetLeave(1);
            
            //Set the approval detail as like approving the leave
            leave.ApproveBy = 100;
            leave.ApprovalDate = DateTime.Now;

            LeaveWorkflowServiceClient wfLeave = new LeaveWorkflowServiceClient();
            wfLeave.ApproveLeave(leave);
        }

First step, run apply leave unit test. A new workflow instance will be created, and a new leave application record will be created in my database. You can track your workflow instance and status by looking at the [System.Activities.DurableInstancing].[InstancesTable] table in the WorkflowInstanceStore database.

Note that the value in the IsInitialized column, it mean this instance has been initialized and persisted. Also, note that BlockingBookmarks column, it tells this instance is expecting ApproveLeave operation. If you perform any other operation for this instance, you would get an error. If you happen to run into error with your workflow application, then your instance will get suspended (you can know your instance is suspended by referring to the value in the IsSuspended column), then you can either terminate it or restore it.

Second step, run approve leave unit test. The previously created workflow instance will be continued. Workflow know which instance to pick by using the correlation token that you passed in earlier. Now if you look at the InstancesTable, you will find the instance record is disappear. The reason is the workflow process is completed, and the instance record is deleted. If you want the instance record remain, you can set the instance completion action in the the web.config.

<behavior name="WorkflowServiceBehavior">
<serviceMetadata httpGetEnabled="true"/>
  <serviceDebug includeExceptionDetailInFaults="true"/>
<sqlWorkflowInstanceStore
      connectionStringName="WorkflowPersistenceStore"
      hostLockRenewalPeriod="00:00:30"
      runnableInstancesDetectionPeriod="00:00:05"
      instanceCompletionAction="DeleteNothing"
      instanceLockedExceptionAction="AggressiveRetry"
      instanceEncodingOption="GZip"/>
  <dataContractSerializer maxItemsInObjectGraph="2147483647"/>
</behavior>

That is all about Workflow SQL Persistence. Next topic I will share how to track workflow activity status, instance status, and logging the error in event viewer or SQL server.
If you wish to have my complete sample project source code, feel free to download it from HERE.

Friday, October 26, 2012

How to Configure SQL Persistence for Workflow 4?

Today's topic is how to enable workflow to persist data into SQL server. If you do not know what is windows workflow foundation, you can refer back my previous post here.

The reason we need to persist workflow into SQL server is because we cannot guarantee our web server is 24x7 online and problem free. So, we have the workflow instance and its data get persisted into a database, so that we do not need to worry about data loss when the server is offline. The workflow instance can be automatically restored after the server get online.

How to enable SQL persistence?

1. Create a new empty database call WorkflowInstanceStore.

2. Locate the following SQL scripts:
SqlWorkflowInstanceStoreSchema.sql and
SqlWorkflowInstanceStoreLogic.sql
from the folder C:\Windows\Microsoft.NET\Framework\v4.0.30319\SQL\en

3. Execute the SQL scripts in the correct order on the new empty database.
First, SqlWorkflowInstanceStoreSchema.sql
Second, SqlWorkflowInstanceStoreLogic.sql

Here is how the schema look like:

There are a few ways to host the workflow, but today's topic I only cover WAS-Hosted Workflow Services that setup the WorkflowServiceHost with configuration file. For Self-Hosted Workflow Service, you can refer to this MSDN article.

4. Open up your web host project web.config. Create a new service behavior for workflow instance store.

<behavior name="WorkflowServiceBehavior">
<serviceMetadata httpGetEnabled="true"/>
  <serviceDebug includeExceptionDetailInFaults="true"/>
<sqlWorkflowInstanceStore
      connectionStringName="WorkflowPersistenceStore"
      hostLockRenewalPeriod="00:00:30"
      runnableInstancesDetectionPeriod="00:00:05"
      instanceCompletionAction="DeleteAll"
      instanceLockedExceptionAction="AggressiveRetry"
      instanceEncodingOption="GZip"/>
  <dataContractSerializer maxItemsInObjectGraph="2147483647"/>
</behavior>

5. Create a new connection string for workflow instance store.

<add name="WorkflowPersistenceStore" connectionString="server=.;database=WorkflowPersistenceStore;Trusted_Connection=true;multipleactiveresultsets=false" providerName="System.Data.SqlClient" />

Now your workflow application is enabled with SQL persistence. The following are the steps to create a sample project that work with workflow instance store.

Next, how to make use of workflow correlation token to continue the process flow from an active instance? Please read the next post.

Monday, October 22, 2012

Error 0x80070005 occurred during full-text index population for table or indexed view

If you read and follow my previous post about setting up Full-Text Search and then after you execute the Full Population for the Full-Text index (as follow screenshot), but nothing happen after that.

And then, when you try to execute any query with CONTAINS or FREETEXT syntax, you find no record return.

Then, open the Full-Text Indexes properties window, and you find there are some numbers in the Table Full-Text Fail Count.

In oder to see the reason of fail full-text population, go to your database log folder. Normally it is located at your <MSSQL Installed Path>\MSSQL\Log. Then, look for the log file that having prefix "SQLFT" with the latest modified date.

If you find one of the log entry is something like this:

Error '0x80070005' occurred during full-text index population for table or indexed view '[QueryPerfTest].[dbo].[Profiles]' (table or indexed view ID '12345678', database ID '1'), full-text key value '8F8C29D7-EFB0-49E9-827F-FFCBC4FCE85C'. Attempt will be made to reindex it.

Informational: Full-text retry pass of Full population completed for table or indexed view '[QueryPerfTest].[dbo].[Profiles]' (table or indexed view ID '12345678', database ID '1'). Number of retry documents processed: 213. Number of documents failed: 213.

Also, at the same log folder, open up the ERRORLOG file, and you find the log entries are like this:

Error: 7691, Severity: 16, State: 1.

Access is denied to full-text log path. Full-text logging is disabled for database 'QueryPerfTest', catalog 'ftCatalog' (database ID '1', catalog ID '10').

A new instance of the full-text filter daemon host process has been successfully started.

Error: 30089, Severity: 17, State: 1.
The fulltext filter daemon host (FDHost) process has stopped abnormally. This can occur if an incorrectly configured or malfunctioning linguistic component, such as a wordbreaker, stemmer or filter has caused an irrecoverable error during full-text indexing or query processing. The process will be restarted automatically.

Solution:

Ensure the service account that you use to run the SQL Server, SQL Server Agent, and SQL Full-text Filter Daemon Launcher windows service have the proper rights to perform the full-text index population and also the access to write file into the MSSQL log folder. I, myself as a developer normally would use default Local System account for all the services, therefore we would never encounter such issue. Unlike DBAs who want to have all under control including the security measurement. Sometimes the service account privilege was not setup properly. I helped someone in troubleshooting this issue and share the solution that she had found.

Wednesday, October 17, 2012

How to terminate suspended workflow instance?

While you are developing your workflow service, I believe you definitely would do a test run and then you encounter an error. At this time, you would go and fix your error, then you do another test run again. But, you keep getting the exception message something like unable to proceed due to workflow instance is in suspended state. So, what you can do now is to terminate the suspended workflow instance, and then retry with your fix.

How to terminate suspended workflow instance?

Execute the following code, be it in console application, unit test or whatever.
Note: The following code is for Workflow 4.0 or later. You need to add System.Activities and System.Activities.DurableInstancing assembly reference.

// Define workflow instance store.
SqlWorkflowInstanceStore instanceStore =
    new SqlWorkflowInstanceStore(
        ConfigurationManager.ConnectionStrings["WorkflowInstanceStore"].ConnectionString);


// Setup the workflow with a definition.
WorkflowApplication app = new WorkflowApplication(new MyWorkflow());


// Assign the instance store.
app.InstanceStore = instanceStore;


// Load the workflow instance.
app.Load(new Guid("[Put Your Instance's GUID Here]"));


// Restart the workflow execution.
app.Run();


// Terminate the workflow instance.
app.Terminate("Terminated due to unrecoverable errors.");

EDIT:
Credit to Serena for the source code. :P

Tuesday, October 16, 2012

Efficient Dynamic Query - Part 3

If you miss Part 1 & 2, click the following links.
Efficient Dynamic Query - Part 1
Efficient Dynamic Query - Part 2

In my previous post, I mentioned to share about how to setup Full-Text search in MS SQL Server. So, here you go. I am using MSSQL 2008 R2 for this post, it should be similar for MSSQL 2012.

1. Add feature to your existing MSSQL server:

Insert your MSSQL 2008 R2 installer ISO or disc, then install new or add feature to your existing MSSQL server. In the feature selection window, make sure to check the "Full-Text Search". Then, proceed to complete the installation.

2. Define full-text index on table:

3. Select the unqieu index, can be primary key (clustered index) or unique non-clustered index.

4. Select the columns that need full-text index, the language selection is optional, default is English.

5. Set the track changes occurance.

6. Give a catalog name, or select existing one if you have any and wish to join this table to existing catalog.

7. Define index population schedule (optional), the population schedule can be either perform on particular table only or on particular catalog that may contain multiple tables.

8. If you wish to schedule index population, for production environment, recommend that you set the schedule to run population during non peak hour. For development environment, you can set the schedule to run population while CPU is idle. The reason is index population require quite some CPU power and may cause table lock.

9. Verify the actions then you are done with defining full-text index.

10. If you never schedule any index population, you can do it manually too.

After you are done with above configuration, now you can use the following syntax in your query:

1. CONTAINS

Similar to equal operator (=). You can use AND, OR, NOT operand in the expression.

SELECT *
FROM [dbo].[Profiles]
WHERE CONTAINS (Hobbies, 'cooking')
--WHERE CONTAINS (Hobbies, 'cooking OR fishing')
--WHERE Hobbies = 'cooking'

2. FREETEXT

Similar to LIKE operator

SELECT *
FROM [dbo].[Profiles]
WHERE FREETEXT (Hobbies, 'fish')
--WHERE Hobbies LIKE '%fish%'

3. CONTAINSTABLE

Get a full-text search table with CONTAINS.
The return table only contains 2 columns: KEY and RANK.
KEY is the unique ID of the returned row.
RANK is the score of how well the row matched the criteria. The more score represent the more accurate.

SELECT *
FROM [Profiles] AS T1 INNER JOIN
   CONTAINSTABLE ([Profiles], Hobbies, 
      'cooking'
   ) AS T2
   ON T1.ID = T2.[KEY]

4. FREETEXTTABLE

Similar to CONTAINSTABLE, but predicate with free text search.

SELECT *
FROM [Profiles] AS T1 INNER JOIN
   FREETEXTTABLE ([Profiles], Hobbies, 
      'fish'
   ) AS T2
   ON T1.ID = T2.[KEY]

More detail from MSDN.

Wednesday, October 10, 2012

Efficient Dynamic Query - Part 2

Continue from the previous post "Efficient Dynamic Query - Part 1", someone actually requested me to write more detail about the solution of the problem.

We know the root cause, we know that putting input parameters null value check in the SQL WHERE condition, no matter how, it will perform table index scan. Therefore, I would like to share how should we write a stored procedure or C# code with ADO.net to prevent this thing happen.

First, we should only include those necessary data filter into the query. No more @input parameter IS NULL check insider WHERE condition!

Secondly, try to avoid using LIKE operator, I know it is not quite possible. However, if we use CONTAINS or FREETEXT (provided the SQL Server is enabled with Full-Text Search feature), there are pro and con we need to take note. The pro is good and fast for text search in large VARCHAR size data column; the con is you need to run a propagation job to build the full text index of the table and at this time the SQL server will be heavy load and probably will cause table lock. Normally the full text index population job run during non peak time.

The concept is to form the SELECT query string first, then form the WHERE condition, perform value check against input parameter, if the parameter value is null, do not include it into WHERE condition.

Yes, it sounds tedious but it is worth your time.

Stored Procedure:

CREATE PROCEDURE [dbo].[Profiles_Search]
  @firstName VARCHAR(50) = NULL,
  @lastName VARCHAR(50) = NULL,
  @hobbies VARCHAR(50) = NULL
AS
BEGIN

 DECLARE @query NVARCHAR(300)
 DECLARE @filter NVARCHAR(300)
 DECLARE @paramDefinition NVARCHAR(300);

 SET @query = 'SELECT * FROM dbo.Profiles '
 SET @filter = ''
 
 IF @firstName IS NOT NULL
  SET @filter = @filter + 'AND FREETEXT (FirstName, @firstName) ';

 IF @lastName IS NOT NULL
  SET @filter = @filter + 'AND FREETEXT (LastName, @lastName) ';

 IF @hobbies IS NOT NULL
  SET @filter = @filter + 'AND FREETEXT (Hobbies, @hobbies) ';
  
 IF @firstName IS NOT NULL OR @lastName IS NOT NULL OR @hobbies IS NOT NULL
 BEGIN
  --Remove the first AND before concate it with WHERE
  SET @filter = SUBSTRING(@filter, 4, LEN(@filter)) 
  SET @query = @query + 'WHERE' + @filter
 END

 SET @paramDefinition = '@firstName VARCHAR(50), @lastName VARCHAR(50), @hobbies VARCHAR(50)'
 PRINT 'Query: ' + @query
 PRINT 'Parameters: ' + @paramDefinition
 EXEC sp_executesql @query, @paramDefinition, @firstName, @lastName, @hobbies
END

Or, you can use LIKE by replacing the FREETEXT with LIKE such as following example, but be warned that the wildcard at the front of expression will still cause table index scan.

IF @firstName IS NOT NULL
 SET @filter = @filter + 'AND FirstName LIKE ''%'' + @firstName + ''%'' ';

IF @lastName IS NOT NULL
 SET @filter = @filter + 'AND LastName LIKE ''%'' + @lastName + ''%'' ';

IF @hobbies IS NOT NULL
 SET @filter = @filter + 'AND Hobbies LIKE ''%'' + @hobbies + ''%'' ';

WARNING: Putting logic into stored procedure is a bad practice, use at your own risk, and if you have any better idea, feel free to share with me.

Or, if you do not like or do not use stored procedure, you can have the logic coded in C#. Following is the example code with Enterprise Library 5.0, but the code is not completely shown here. I just put code snippet to show the idea only how to form the query which only include the necessary filter to the WHERE condition.

C#:

public List<Profile> Select(string firstName, string lastName, string hobbies)
{
    string SQL_STATEMENT =
        "SELECT * FROM Profiles {0}";

    List result = new List();

    // Connect to database. DatabaseFactory comes from enterprise lirary
    Database db = DatabaseFactory.CreateDatabase(base.ConnectionName);
    using (DbCommand cmd = db.GetSqlStringCommand(SQL_STATEMENT))
    {
        // Filter construction.
        string filter = string.Empty;

        filter += BuildFilter(db, cmd, "FirstName", firstName);
        filter += BuildFilter(db, cmd, "LastName", lastName);
        filter += BuildFilter(db, cmd, "Hobbies", hobbies);

        //Remove the first AND
        if (!string.IsNullOrEmpty(filter))
            filter = "WHERE " + filter.Remove(0, 4);

        // Statement construction.
        string sqlCmd = string.Format(SQL_STATEMENT, filter);

        cmd.CommandText = sqlCmd;

        using (IDataReader dr = db.ExecuteReader(cmd))
        {
            while (dr.Read())
            {
                Profile profile = LoadProfile(dr);
                result.Add(profile);
            }
        }
    }

    return result;
}

private string BuildFilter(Database db, DbCommand cmd, string columnName, 
    string columnValue)
{
    string filter = string.Empty;
    string paramName = string.Format("@{0}", columnName);

    if (!string.IsNullOrWhiteSpace(columnValue))
    {
        db.AddInParameter(cmd, paramName, DbType.String, columnValue);
        filter = string.Format(" OR {0} LIKE '%' + {1} + '%'", columnName, paramName);
        //Change LIKE to CONTAINS or FREETEXT here
    }

    return filter;
}

private Profile LoadProfile(IDataReader dr)
{
    Profile profile = new Profile();
    profile.ID = base.GetDataValue(dr, "ID");
    profile.FirstName = base.GetDataValue(dr, "Domain");
    profile.LastName = base.GetDataValue(dr, "Name");
    profile.ContactNo = base.GetDataValue(dr, "ContactNo");
    profile.Email = base.GetDataValue(dr, "Email");
    profile.AboutMe = base.GetDataValue(dr, "AboutMe");
    profile.Interests = base.GetDataValue(dr, "Interests");
    profile.Hobbies = base.GetDataValue(dr, "Hobbies");
    return profile;
}

Thanks to Serena for the BuildFilter method idea.

Above code are repetitive in every where whenever developer creates CRUD operations for application. Actually there is a tool to generate this piece of C# code automatically. For more detail about the code generation, please refer to my previous post "Layered Architecture Solution Guidance (LASG)". Therefore, with the tool, you can actually save your time in doing code copy and paste, then change the parameters, etc. The LASG tool will automatically do it all for you with just a few clicks.

Next topic, I will cover full-text index setup and the performance difference between FREETEXT and LIKE.

Monday, October 8, 2012

Efficient Dynamic Query - Part 1

I believe when every developer wants to create a CRUD screen, every time, we have a requirement that must have a filter function for your data grid or grid view. The filter normally has multiple text boxes that support multiple data filter. And, when a particular text box is left empty, that data should not get filtered from the record set. Well, normally we create dynamic query to handle this situation.

I am not sure how you write your dynamic query, but I want to show you how I did it and how bad it is and see if you agree or not.

Here is my table. Imagine that I have the requirement that want me to create data filter for FirstName, LastName and Hobbies columns only in my table.

Normally what I did is to create the following SQL query:

SELECT *
FROM [QueryPerfTest].[dbo].[Profiles]
WHERE
(@firstName IS NULL OR FirstName LIKE '%' + @firstName + '%') AND
(@lastName IS NULL OR LastName LIKE '%' + @lastName + '%') AND
(@hobbies IS NULL OR Hobbies LIKE '%' + @hobbies + '%')

Above query allow me to pass in NULL value to my input parameter if I want to exclude the data filter for that particular data column. However, the above query look clean and readable, but later I realize it is a time bomb! Thanks to my mentor, and now I show you why.

Look at the execution plan of my query:
In case you do not know what is an execution plan, it is the result of the query optimizer's attempt to calculate the most efficient way to implement the T-SQL query that you executed.
There are 2 ways of getting execution plan. One is to get estimate execution plan, another is to get actual execution plan. Normally both way may return the same result, but sometimes not. In order to get more accurate info, always display actual execution plan but beware it is more expensive in term of system resource compare to get estimate execution plan.
You can get the estimated execution plan by right clicking at the query window, click the "Display Estimated Execution Plan". Or, if you want to display the actual result and execution plan together, just right click at the query window, click the "Include Actual Execution Plan", then hit F5 to execute your query.

Following is my actual execution plan of my query:

I have 1 million rows of record in my table, and my above query would only return 1 row of record. Base on the execution plan, it actually perform index scan (it is normal when query non index column and using LIKE operator), and when you mouse over the execution plan result, you get the yellow tooltips.

The tooltips show the detail of the operations during the query execution. Basically the details are mostly already self explained except a few such as:

Estimated Subtree Cost – The total cost to the query optimizer for executing this operation and all operations preceding it in the same subtree. But, it should be read from right to left. This number is meaningless to us. It represents the amount of time that the optimizer thinks this operator will take.
Estimated number of rows – The number of rows produced by the operator.

In short, every numbers show in the tooltips, the lower the better.
More detail about the tooltips - MSDN.

Normally, when we encounter such performance issue, we have few options to enhance it.
Option 1: Create indexes for columns.
Option 2: Change the query.

Let's try option 1 first.

In case you do not know what is an index, imagine a database is like a dictionary, when you want to look for a word in your dictionary, you normally would look for the first character of the word in the index page first, then skip to the page number that contains the first character of the word, then flip page by page to search for the exact word. Same goes to the database, the query optimizer attempt to scan the index then locate your data. By the way, the above explained execution plan is actually related to this example. It show you the actions that how the optimizer are going to do in order to search a word in the dictionary.

There are a few index types available for you to create. Most of the time we create non clustered index for non primary key column because SQL server organize the indexes with B-tree structure. With this structure, clustered indexes have to be unique and in sorted order, and that's the reason primary key column usually is clustered index and you cannot have more than one clustered index in the same table. For non clustered index, it is additional index that is organized base on clustered index.

Theoretically after creating non clustered indexes for the related table columns, it should improve the query performance because there are extra indexes for optimizer to scan and locate index fast and easy.

Let's see after adding non clustered index to the column, would it help or not.

As above screenshot, it has no performance improvement at all, instead it get worse. Why?

Notice that Estimated Number of Rows and Row Size has increased, which mean more data rows scanned. As long as my query contains LIKE operator and contains wildcard at the front, it will always perform index scan. The reason is when there is a wildcard at the front, it means that the optimizer will never know what value is going to be look like at the front. It is just like I give you a task to look for all the words that contain a few characters in the middle, I am sure you will be clueless and you are going to scan the whole dictionary. Furthermore, since I had created extra non clustered indexes for my columns, it got more indexes to scan.

Alright, now if I remove the wildcard at the front, it should perform faster because without the front wildcard, you can easily locate any word that start with a fixed value through the index, and then you can ignore the back wildcard.

If you look at the result above, you will question what the hell has happened, the performance still has no improvement at all?!?!

Well, this is the thing that I want to share today. For easy readability and cleaner look query, it comes with a price. Take note the input parameter IS NULL check. Even though you have created index, remove wild card from the query, the optimizer still perform index scan! You may have the same question I got in my mind before realizing this, you would ask I am checking my input parameter IS NULL but not the table column, it should not related to the table index scan. So, I remove all the IS NULL check from the query, and see what happen next.

You will see that the optimizer perform index seek! Look at the tooltips result, it show a very big significant improvement.

Summary:

- Never create unnecessary index to the table column. It may degrade the performance.
- Try to avoid using LIKE operator or wildcard at the front. Use text search if possible.
- Don't be a lazy @$$ like me, just write extra few lines of code, it pays.
- If you want to create dynamic query, and if you use stored procedure, you can write IF ELSE statement to form the query string that exclude the data filter for null value input parameter. Or, if you use ADO.net SqlCommand, you can form the dynamic query even more easily.

Sylvester's Knowledge Base