Sylvester's Knowledge Base: MSSQL

Showing posts with label MSSQL. Show all posts

Sunday, January 12, 2014

SSDT : External Database Reference Error

Today I face a challenge with SSDT (SQL Server Data Tools). I encounter some errors when I have multiple database references in my SQL Server Database Project in Visual Studio 2013.

The errors that I am facing now:

SQL71561: View: [dbo].[View_1] has an unresolved reference to object [DatabaseB].[dbo].[Table_1].
SQL71501: View: [dbo].[View_1] contains an unresolved reference to an object. Either the object does not exist or the reference is ambiguous because it could refer to any of the following objects: [DatabaseB].[dbo].[Table_1].[Column1] or [dbo].[Table_1].[t2]::[Column1].

After googling around, the suggested cause of the errors is I am using 3 part name for the same database and then using the table which comes from different database in my query without having database reference in my project. The actual root cause is the database project itself perform database object validation at the background, and it cannot find the other database.

SELECT t1.*

FROM Database1.dbo.Table_1 AS t1

LEFT JOIN Database2.dbo.Table_1 AS t2

ON t1.Column1 = t2.Column1

Therefore, I have added other database projects that I need into my solution like the following screenshot.

Then, add the required database reference. Note that I do not need database variable, so I left the field empty. The example usage is correctly showing how I should and would use the database reference.

After that, I rebuild my project but I am still getting the same error. I have no choice but to remove the 3 part name for current database and remain the 2 part name as highlighted red in above query.

Finally, no more error has occur and my project is able to be built. Happy! But, later discover that there are more challenges await me.

See my solutions explorer screenshot above, I have more than two databases. I have a lot more complicated queries that need to deal with multiple databases. For example:

In Database1:

SELECT t1.*

FROM dbo.Table_1 AS t1

LEFT JOIN Database2.dbo.Table_1 AS t2

ON t1.Column1 = t2.Column1

In Database2:

SELECT bla bla bla

FROM dbo.Table_1

UNION ALL
SELECT bla bla bla
Database1.dbo.Table_2

So, as you can see Database1 need to add Database2 as database reference and then Database2 need to add Database1 as database reference too. If you go and do that in Visual Studio, you will get the following error:

A reference to library 'Database' cannot be added.

Adding this project as a reference would cause a circular dependency.

It seem like Visual Studio treat the database reference as assembly reference. Having both database projects referred to each other is considered as circular reference. So, how am I suppose to do now?

I found a workaround but not everyone may accept it. I realize that by adding Data-tier Application Package (*.dacpac) as database reference, Visual Studio will not complain about circular dependency. Therefore, I try to extract dacpac for all the related databases by using SQL Server Management Studio (SSMS).

Just click Next button all the way until the Finish. By default, the Data-tier Application is extracted and stored at C:\Users\[UserName]\Documents\SQL Server Management Studio\DAC Packages\[DBName].dacpac. I would recommend to extract the DAC package files to a centralized location, so that it is easier to retrieve, track and manage the packages later.

While extracting the DAC package file, you may encounter the same error SQL71561 and SQL71501 again.

You have to extract the package manually by using SQLPackage.exe.

The location of the SQLPackage.exe is C:\Program Files (x86)\Microsoft SQL Server\[version]\DAC\bin

Below is the command line that I use to extract dacpac:

sqlpackage.exe /Action:Extract /ssn:. /sdn:Database1 /tf:"E:\DAC Packages\Database1.dacpac"

/ssn = Source server name
/sdn = Database name
/tf = Target file

More parameters info can be found HERE.

After you have all the DAC packages ready in one centralized location, now back to Visual Studio to add them as database reference.

My current setup is:
Database1 has Database2 as reference.
Database2 has Database1 as reference.

No more circular dependency complaint and no more database reference not found error. Also, another advantage of using DAC package as database reference is you need not to create or add other database project which is not developed by you to be included into the solution.

Thursday, October 17, 2013

Dynamic Table Partition Switching (MSSQL)

Problem

When you have a lot of data need to be inserted into the database, or so called data import, table locking is inevitable due to the database is ensuring data integrity with the read committed isolation level. When one transaction is required to perform a lot of data insertion or update, it is going to lock the table and therefore could cause more wait time for any other transaction which are currently queuing up for table access. It is recommended to split the huge data insertion or update into multiple transaction which can be completed quickly by writing the query carefully.

Today topic is about making use of table partition switching to reduce number of table lock and you still be able to insert huge amount of records. The idea is coming from Serena Yeoh, I am here to share the concept and how it was implemented.

Concept

Imagine that you need to insert huge amount of records into a table. You worry that your action will cause table locking and others cannot access the table, eventually they get query timeout error or deadlock error from their application. The concept is to have a staging (temporary) table which is created base on the main table which you are going to insert huge amount of records into it. You can just insert all the records into the staging table first temporarily without affecting others who are using the main table. When your data is ready in the staging table, at the main table, split the last partition into a new partition with the size same as the staging table size, then switch the staging table to the main table newly split partition, then merge the split partition back with the last partition.

Challenges

The are a few challenges to implement the above concept. One of them is the identity that you are going to insert into the staging table may already exists in the main table, since it could be used by a different session which is doing record insertion. The identity seed of the main table is incrementing from time to time. Therefore, the main table primary key constraint check will fail and it prevent you from switching the staging table in to the split partition.

Another challenge is when a session is inserting multiple records, at the same time, you have split the last partition to a new partition. The other session which is inserting records, the records will be leak into the new partition. When you have data in the new partition, you cannot switch the staging table to the new partition.

When you have above challenges solved, you will face another problem with partition fragmentation. Imagine that there are multiple session are doing the same table partition splitting and merging, if one fail, it will leave the split partition there with empty data. Although you can protect it with transaction, the split partition can be rollback, but the table identity cannot be rollback because there are other sessions still doing the record insertion, and table identity keep incrementing, rolling back the identity could cause key duplication error later. Anyway, it is very rare that staging table cannot be switch in, unless it contain dirty data. However, it is still manageable because we can fix the data, switch and merge the partition manually later.

The following are the implementation of the concept and the solution for the mentioned challenges:

Implementation

First, we have to make sure the main table must be a partitioned table. All the indexes of the main table also must be partitioned. For me, I have a simple table, and only have one partition. I created a RANGE LEFT partition function, and a simple partition scheme which have all partition store in PRIMARY file group.

CREATE PARTITION FUNCTION [ContactListPartitionFunction] (BIGINT)

AS RANGE LEFT

FOR VALUES (0);

CREATE PARTITION SCHEME [ContactListPartitionScheme]

AS PARTITION [ContactListPartitionFunction]

ALL TO ([PRIMARY]);

Create a partitioned table.

CREATE TABLE [dbo].[ContactList] (

[ContactId] BIGINT IDENTITY (1, 1) NOT NULL,

[FullName] VARCHAR (100) NOT NULL,

[ContactNumber] VARCHAR (50) NOT NULL,

[Address] VARCHAR (500) NOT NULL,

[CreationDate] DATETIME NOT NULL,

CONSTRAINT [PK_ContactList] PRIMARY KEY CLUSTERED ([ContactId] ASC) ON [ContactListPartitionScheme] ([ContactId])

);

Create partitioned indexes for my table.

CREATE NONCLUSTERED INDEX [IX_CreationDate_Sort]

ON [dbo].[ContactList]([CreationDate] ASC)

ON [ContactListPartitionScheme] ([ContactId]);

CREATE NONCLUSTERED INDEX [IX_Contact_Search]

ON [dbo].[ContactList]([FullName] ASC, [ContactNumber] ASC)

INCLUDE([Address])

ON [ContactListPartitionScheme] ([ContactId]);

Next, we prepare a temp table (staging table) for the main table. You can do it by using SQL Server Management Studio or writing T-SQL scripts manually.

Right click the main table, then look for Storage menu, click the Manage Partition menu.

Select "Create a staging table for partition switching".

Enter the staging table name, this is the temp table which you are going to insert record into it. Set any new boundary value, this is the value that we need to change it dynamically later in programming way by depending on the number of record in this table.

Run the script immediately.

Now, delete the constraint of the temp table, because we are going to add it later.

Then, proceed to insert multiple records into the temp table. Take note that the identity of the record must be in numeric data type. The reason is because we need to make use of the identity to solve the potential duplicate identity constraint check challenge while switching table partition. The inserted first row of record must always start with identity value 1 and is incremental for the rest of the rows.

Once the data in the temp table is created properly, we can now proceed to split the partition.
The following are the information that we need before performing the data transfer:

Temp table row count
Main table last identity
Partition number
Previous partition boundary (left)
Split partition boundary (right)

The following script is the step by step execution for the concept. See the comment for more detail.

DECLARE @rowCount BIGINT

DECLARE @currentIdentity BIGINT

DECLARE @newIdentity BIGINT

DECLARE @partitionNumber INT

DECLARE @leftBoundary BIGINT

DECLARE @rightBoundary BIGINT

DECLARE @sql NVARCHAR(1000)

-- Use transaction to lock the table for table switching

-- Try to minimize code and reduce as much wait time as possible

BEGIN TRY

BEGIN TRANSACTION

-- Step 1 : Get the row count of the temp table

SELECT @rowCount = COUNT(0) FROM [dbo].[Staging_ContactList]

-- Step 2 : Get the last identity of the main table

SET @currentIdentity = CONVERT(BIGINT, IDENT_CURRENT('[dbo].[ContactList]'))

-- Step 3 : Get the boundary base on number of record to be inserted

-- This identity = 1 check is required when the table is empty, left boundary must be 0

IF @currentIdentity = 1

SET @leftBoundary = @currentIdentity - 1

ELSE

SET @leftBoundary = @currentIdentity

SET @rightBoundary = @leftBoundary + @rowCount

-- Step 4 : Reseed the table with the new identity for other session to insert new record with latest identity and avoid duplicate key constraint failure while table switching

-- Also to reserve identity for the temp table

DBCC CHECKIDENT('ContactList', RESEED, @rightBoundary)

PRINT 'Row Count : ' + CONVERT(VARCHAR, @rowCount)

PRINT 'Current Identity : ' + CONVERT(VARCHAR, @currentIdentity)

PRINT 'Left Boundary : ' + CONVERT(VARCHAR, @leftBoundary)

PRINT 'Right Boundary : ' + CONVERT(VARCHAR, @rightBoundary)

-- Step 5 : Update temp table identity base on the main table current identity to solve the identity constraint check challenge

-- When you have the identity value in proper order start from 1, 2, 3, 4... in your temp table

-- And, the last identity before the partition to be switched is 8592

-- Update all the identity in the temp table to 8592+1, 8592+2, 8592+3, 8592+4...

-- End result you will not have duplicate key error occur when switching table

UPDATE [dbo].[Staging_ContactList]

SET ContactId = ContactId + @leftBoundary

-- Step 6 : Get the partition number base on the new boundary value

SELECT @partitionNumber = $PARTITION.ContactListPartitionFunction(@rightBoundary)

PRINT 'Partition Number : ' + CONVERT(VARCHAR, @partitionNumber)

-- Step 7 : Add check constraint to temp table to fulfill the criteria for table partition switching

SET @sql = '

ALTER TABLE [dbo].[Staging_ContactList] WITH CHECK ADD CONSTRAINT [chk_Staging_ContactList_partition_' + CONVERT(VARCHAR, @partitionNumber) + '] CHECK ([contactID]>N''' + CONVERT(VARCHAR, @leftBoundary) + ''' AND [contactID]<=N''' + CONVERT(VARCHAR, @rightBoundary) + ''')

ALTER TABLE [dbo].[Staging_ContactList] CHECK CONSTRAINT [chk_Staging_ContactList_partition_' + CONVERT(VARCHAR, @partitionNumber) + ']

PRINT @sql

EXEC sp_executesql @sql

-- Step 8 : Switch the partition

ALTER TABLE [dbo].[Staging_ContactList]

SWITCH TO [dbo].[ContactList]

PARTITION @partitionNumber;

-- Step 9 : Merge previous partition with the partition of previous partition

ALTER PARTITION FUNCTION ContactListPartitionFunction ()

MERGE RANGE (@rightBoundary);

ALTER PARTITION FUNCTION ContactListPartitionFunction ()

MERGE RANGE (@leftBoundary);

-- Optional 1 : Drop the temp table

--DROP TABLE [dbo].[Staging_ContactList]

-- Optional 2 : Truncate table then drop the check constraint

SET @sql = '

ALTER TABLE [dbo].[Staging_ContactList] DROP CONSTRAINT [chk_Staging_ContactList_partition_' + CONVERT(VARCHAR, @partitionNumber) + ']

PRINT @sql

EXEC sp_executesql @sql

COMMIT TRANSACTION

END TRY

BEGIN CATCH

SELECT

ERROR_NUMBER() AS ErrorNumber

,ERROR_SEVERITY() AS ErrorSeverity

,ERROR_STATE() AS ErrorState

,ERROR_PROCEDURE() AS ErrorProcedure

,ERROR_LINE() AS ErrorLine

,ERROR_MESSAGE() AS ErrorMessage;

ROLLBACK TRANSACTION

END CATCH

Testing

When you are playing around with the script above, it is useful to query the partition from time to time to visualize what is happening with the table partition by executing the following query:

SELECT p.partition_number AS [Partition No] , g.name AS [FileGroup], p.[rows] AS Rows, r.[value] AS [Boundary] FROM sys.partitions p

INNER JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id

INNER JOIN sys.partition_schemes s ON s.data_space_id = i.data_space_id

INNER JOIN sys.destination_data_spaces d ON d.partition_scheme_id = s.data_space_id AND d.destination_id = p.partition_number

INNER JOIN sys.filegroups g ON g.data_space_id = d.data_space_id

LEFT JOIN sys.partition_range_values r ON r.function_id = s.function_id AND r.boundary_id = p.partition_number

WHERE p.[object_id] = OBJECT_ID('ContactList') AND i.index_id = 1

Now, test the concept whether it works or not. I have created a simple T-SQL which keep inserting records to my main table, at the same time, I ready my temp table data, then execute the above script. I see my records in the staging table successfully switch in to the main table.

Performance

The table partition split is going to take some time to complete the process, and during the time it will cause table locking. When the partition size is big, it is going to take even lot more time to split the partition. It is advisable to have multiple small - medium size partitions in one table instead of just one big partition. It is always faster to split a smaller partition.

Security

This concept require you to create and alter table and deal with partitions. If you wonder what security rights is required if you use this implementation, the answer is db_ddladmin or db_owner.

If you like this concept or think this is a crazy idea, feel free to drop me a comment. Thanks.

Sunday, May 19, 2013

Query Notification & Service Broker

Do you ever encounter a system requirement which require you to periodically monitor a database table to check if there is any data change within it, then you need to perform some code work?

For example, the requirement is to call a web service when there is any data change in the database. What would you do in this scenario?

Option 1:

Are you going to use Trigger? When there is a record being inserted or updated, you write some code logic in the Trigger to perform some work and then call a web service? However, you do realize that code is restricted to the database area only. I mean, you are only able to write T-SQL to perform the work including calling a web service which is not possible.

Option 2:

However, there is an alternate way to do it by using SQL CLR to perform the work and calling a web service with C# code. Later, I realize that the web service client require System.ServiceModel.dll reference, and this assembly is not available in the SQL Server. Therefore, in order to make this work, I need to configure my SQL Server to grant permission to my SQL CLR Trigger to be able to call unsafe assembly and also making my SQL Server to trustworthy. However for this case, some companies rules do not allow us to change any database setting, we have to skip this idea.

Option 3:

When Trigger is out from the options, we have to do database table polling, which mean every second or every specific time interval, I am going to query my database to see any data change in my table. I can write a console application or a windows service to perform the polling job. Once my application detect a change, I will perform the required work and call a web service. This option is commonly used but not efficient because keep polling the database is going to stress the database.

Option 4:

In stead of using trigger and polling, we request the database to notify us when there is a change in the database by using Service Broker + Query Notification. And, this post topic is about how to setting it up and use it.

But before that, we need to know the difference between Query Notification and Polling. Is your problem really can be solved by using Query Notification?

Source from MSDN:

Query Notification	Polling
Query notification is initiated by SQL Server. The notification statement issued by the adapter just instructs the database to initiate notification in case there is a change in the result set of the statement.	Polling is initiated by the adapter. The adapter executes a statement to validate whether data is available for polling, and then initiates polling by executing the polling statement if some data is available for polling.
You can use the query notification statement to only read data in a SQL Server database table.	You can use the polling statement to read or update data in a SQL Server database table.
Query notification informs only about the type of change in the data such as Insert, Update, and Delete.	Polling informs you about the actual data that has changed.
The data-change notification is instantaneous.	The data-change notification depends on the polling interval, and the adapter clients are informed about the data changes at the end of every polling interval. Polling can give you better throughput in scenarios where the data changes are happening continuously, and you do not want to be notified of each change as and when it happens. Instead, you specify a polling interval after which you want to be notified of all the changes that have happened since the last change notification.

The concept is we provide a query for SQL Server to monitor. Then, we enable Service Broker in the SQL Server. We subscribe a Query Notification request, so that if there is any change to the provided query, a notification will be sent to a queue by the Service Broker.

Assume I have a database call NotificationDB. In the database, I have a table call dbo.Leaves, and I need to subscribe query notification to that table.

Note: The query notification subscription need to meet the requirement mentioned here.

So, the following are the steps to make the mentioned concept work:

Database side:

Enable Service Broker in the SQL Server by running the following script:

USE NotificationDB

ALTER DATABASE NotificationDB SET ENABLE_BROKER
Create a queue for Service Broker to drop notification message into it:

CREATE QUEUE MyNotificationQueue
Create a service that drop message to the above queue.

CREATE SERVICE MyNotificationService

ON QUEUE MyNotificationQueue

([http://schemas.microsoft.com/SQL/Notifications/PostQueryNotification])
Create a route for the service to point back to my local database

CREATE ROUTE

NotificationMessagesRoute

WITH SERVICE_NAME = 'MyNotificationService',

ADDRESS = 'LOCAL'

Application side:

Start a listener to the queue by using SqlDependency :

SqlDependency.Start(

ConfigurationManager.ConnectionStrings["default"].ConnectionString,

"MyNotificationQueue");
Create another method that use SqlDependency to poll the Service Broker:

public void RegisterDependency()

{

    using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["default"].ConnectionString))

    {

        try

        {

            SqlCommand command = new SqlCommand("Select [ID],[ApplicantName], [Status] from dbo.Leaves WHERE [Status] = 'Approved' AND HasScheduled = 0", connection);

            //Monitor the Service Broker, and get notified if there is change in the query result

            SqlDependency dependency = new SqlDependency(command, "Service=MyNotificationService;local database=NotificationDB", int.MaxValue);

            //Fire event when message is arrived

            dependency.OnChange += this.dependency_OnChange;

            connection.Open();



            SqlDataReader reader = command.ExecuteReader();

            while (reader.Read())

            {

                Leave leave = new Leave();

                leave.ID = Convert.ToInt32(reader["ID"]);

                leave.ApplicantName = reader["ApplicantName"].ToString();

                leave.Status = reader["Status"].ToString();

                Console.WriteLine(string.Format("{0}\t{1}\t{2}", leave.ID, leave.ApplicantName, leave.Status));

            }

            connection.Close();

        }

        catch (SqlException ex)

        {

            Console.WriteLine(string.Format("Error: {0}", ex.Message));

        }

        catch (Exception ex)

        {

            Console.WriteLine(string.Format("Error: {0}", ex.Message));

        }

    }

}
Create an event handler to handle what to do when a message is arrived. It is required to re-register the dependency every time a notification message has been received:

void dependency_OnChange(object sender, SqlNotificationEventArgs e)

{

    var info = e.Info;

    var source = e.Source;

    var type = e.Type;



    SqlDependency dependency = sender as SqlDependency;

    dependency.OnChange -= new OnChangeEventHandler(dependency_OnChange);

    RegisterDependency(); //Re-register dependency is required after a notification is received everytime

    //Do whatever you like here after message arrive

    //Can be calling WCF service or anything supported in C#

}

FYI: The notification messages are in XML format. The detail message format can be found here: http://schemas.microsoft.com/SQL/Notifications/QueryNotification/QueryNotification.xsd

The above SqlNotificationEventArgs event argument contain the content of the notification message. I am not able to sniff the message content, however, base on the above XML schema, the notification message should be look like this:

So, whenever there is a new record has been inserted to the table, then the monitored query will be affected with the new row, then the notification message will be look like above, and finally the SqlNotificationEventArgs.Type = insert. For any row of record has been updated, the Type will be "update".

Also, you can see how many active query subscriptions currently have in the database with the following statement:

SELECT * FROM sys.dm_qn_subscriptions

Then, you will see something like this:

In summary, instead of polling a table by keep sending query to the database frequently with a timely interval to look for any data change, we can ask the database to notify us. Theoretically, it sound like better performance, however I am not too sure about that, how SqlDedepency handle the database connection may be a concern because in order to get message notifications, it require to constantly open a connection. I have not have a chance to dig deep enough, probably will share more info in the future when I encounter any performance problem with SqlDependency.

If you are interested with my complete source code, feel free to download from HERE.

Sylvester's Knowledge Base