Sunday, December 30, 2012

How to Export Large Data into File?

I have a very interesting topic to share, at least to me. I believe all developers would encounter a system requirement which the user wants to export data from the application into a file and then download it by just clicking a "Export" button. Assume you do for web base application, what would you normally do? The main challenge is the data size to export is huge and contain millions of records.

Well, commonly people would do something like this:


public List<Profile> ExportData()
{
    List<Profile> profileList = new List<Profile>();

    using (SqlConnection sqlConn = new SqlConnection(connectionString))
    using (SqlCommand sqlCmd = new SqlCommand("SELECT * FROM Profiles", sqlConn))
    using (SqlDataReader reader = sqlCmd.ExecuteReader())
    {
        while (reader.Read())
        {
            profileList.Add(new Profile()
            {
                ID = long.Parse(reader["ID"].ToString()),
                FirstName = reader["FirstName"].ToString(),
                LastName = reader["LastName"].ToString(),
                Email = reader["Email"].ToString(),
                ContactNo = reader["ContactNo"].ToString(),
                AboutMe = reader["AboutMe"].ToString(),
                Interest = reader["Interest"].ToString(),
                Hobbies = reader["Hobbies"].ToString()
            });
        }
        reader.Close();
    }
    return profileList;
}

We got to wait for quite some time for all the data being fetched. Most likely you would encounter session timeout error. If you go and extend the session timeout and manage to pull all the records, imagine million rows of record are stored in the server memory. And, you never know how many user is going to click the "Export" button at the same time.

And then, you would need to have the following code to write data into a file. And then, that file will be downloaded by the user.

public static void WriteToFile(string value)
{
    if (File.Exists(path))
    {
        File.Delete(path);
    }
           
    FileStream fileStream = File.Create(path);

    byte[] info = new UTF8Encoding(true).GetBytes(value);
    fileStream.Write(info, 0, info.Length);
}


Finally, you iterate the data and write them into a file.

public void TestMethod1()
{
    DataExportDAC dac = new DataExportDAC();
    List<Profile> profileList = dac.ExportData();

    foreach (Profile profile in profileList)
    {
        StringBuilder builder = new StringBuilder();
        builder.Append(
            profile.ID + "," +
            profile.FirstName + "," +
            profile.LastName + "," +
            profile.Email + "," +
            profile.ContactNo + "," +
            profile.AboutMe + "," +
            profile.Interest + "," +
            profile.Hobbies +
            Environment.NewLine);

        FileUtility.WriteToFile(builder.ToString());
    }
}


The above code looks fine and it actually works, but not efficient. When come to heavy load, it will be extremely slow. There are 2 bottlenecks here. The first one is the while loop for the SqlDataReader to get data row by row. The second one is the foreach loop to append data one by one into a file.

We can do a little enhancement by combining 2 loops into 1.

public void ExportDataIntoFile()
{
    using (SqlConnection sqlConn = new SqlConnection(connectionString))
    using (SqlCommand sqlCmd = new SqlCommand("SELECT * FROM Profiles", sqlConn))
    using (SqlDataReader reader = sqlCmd.ExecuteReader())
    {
        while (reader.Read())
        {
            StringBuilder builder = new StringBuilder();
            builder.Append(
                reader["ID"].ToString() + "," +
                reader["FirstName"].ToString() +"," +
                reader["LastName"].ToString() + "," +
                reader["Email"].ToString() + "," +
                reader["ContactNo"].ToString() + "," +
                reader["AboutMe"].ToString() + "," +
                reader["Interest"].ToString() + "," +
                reader["Hobbies"].ToString() +
                Environment.NewLine);

            FileUtility.WriteToFile(builder.ToString());
        }
        reader.Close();
    }
}


However, millions of hit at the foreach loop is not preventable. Even though you try with different approach such as using OLEDB, Excel COM, OpenXML or any other kind of file export component, you still need to load the data from DB first then insert them into the file export component record by record.

Sometime you would think that when we talk about data export, shouldn't it be done in the database server instead of web server? Since the data are sitting in the database server, why do we need to bring them forward to web server to do the processing then only save them into a file?

Therefore, you would start to create stored procedure or writing T-SQL which to be called by your application to do the data export in your database server.

INSERT INTO OPENROWSET('Microsoft.ACE.OLEDB.12.0',
    'Excel 12.0;Database=E:\testing.xlsx;',
    'SELECT * FROM [Sheet1$]')
SELECT * FROM Profiles


Alright, now you see a big difference. 

But, there is a better way to do this. There is a tool available in SQL Server 2005 or later call Bulk Copy Program (bcp.exe). This tool functionality is to import/export large data in/out from SQL Server.

The usage is simple.

bcp "SELECT * FROM QueryPerfTest.dbo.Profiles" queryout "E:\testing.csv" -S ".\MSSQL2008" -U sa -P p@ssw0rd -c -t ","



For the detail description of all the input parameters, refer to the MSDN site.

How long do you need to export a million rows of record and total data size is about 810MB?


I got it in 25 seconds.

Alright, credits to Serena. She is the one mentioned this awesome tool to me.




Wednesday, November 28, 2012

How to Upgrade Team Foundation Server 2010 to 2012?

I have been given a task to migrate existing Team Foundation Server (TFS) from version 2010 to 2012. The migration include hardware migration and database migration. The OS for both hardware are the same: Windows Server 2008 R2. The database will be upgraded from SQL Server 2008 R2  to SQL Server 2012. The following are the steps that I had done in a virtual environment and it is recommended to do so before start doing the actual work.

I have TFS2010 installed in one of my Hyper-V instances. I try to mimic one of the production TFS2010 Team Project Collections by replicating the projects source code to a different local folder and then check them into the virtual TFS2010. Before mapping the local folder with the TFS2010 and checking in all the source code, you need to remove all the existing source control binding first by deleting all the *.vssscc and *.vspscc files in the project folder and subfolders.

Now, perform a full database backup on TFS2010. If you have TFS2010 power tools installed, you can use it to perform full backup. You can get the power tools from here. Otherwise, use SQL Server Management Studio to do the job.



The backup files are actually *.bak and *.trn. Normally you would see tfs_Configuration.bak, it is the TFS core database. The rest of the *.bak files are your Team Project Collection database. Copy all the *.bak and *.trn files to a shared folder in the new server. The tfs_Configuration contains all your Team Project Collections information. The Team Project Collection database contain all your project source code, work item, etc.

Next, I want to setup TFS2012 in the new server. TFS2012 require minimum of 4GB RAM, you need to ensure you have enough memory for your virtual instance. But before that, I need to ensure all prerequisite software are installed first. So, I have to install Web Server (IIS) role to my new server then I have my SQL Server 2012 installed then follow by Sharepoint Foundation 2010. By the way, we need to include Full-Text Search, Reporting Services and Analysis Services features during the SQL Server installation.



Important note: Please make sure the database collation is Latin1_General_CI_AS. If you miss it, you can change the collation later.

After SQL Server 2012 setup is complete, proceed to restore all the databases which you had backup earlier to this new SQL Server 2012 by using SQL Server Management Studio.



Once all the databases are restored, proceed to run TFS setup. Start the Upgrade wizard.



Then, click Next and come to Database configuration. Then, click the "List Available Databases" link.



The TFS2010 database which you had restored just now will appear in the list. Make sure you check the checkbox at the window below to confirm you have a current backup. Click the Next button to proceed.

Then, enter the service account credential if you do not want to use the default service account. Click the Test link button and get a green check mark before proceed.


The reporting configuration will be copied from TFS2010 configuration automatically base on the tfs_Configuration. Change if you wish.

Configure to use Sharepoint with TFS2012 is optional.

Then, proceed to click Verify button.

Unfortunately I got the following error during the verification process. If you happen to encounter the same error, remain the window open.


Error:
TF255374: The following SQL Server instance is not compatible with Team Foundation Server.

Resolution:
TFS require default collation Latin1_General_CI_AS. I forgot to ensure the correct collation during the SQL Server 2012 installation just now. By default, SQL Server will pick Latin_General_CI_AI collation for Malaysia region. Suppose US folks won't encounter this issue.

Now, insert back the SQL Server 2012 ISO, and then open up command prompt, and then change the directory drive to where your ISO is mounted, and then execute the following command to change the collation:

Setup /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME=MSSQLSERVER 
/SQLSYSADMINACCOUNTS=Administrator /SAPWD=p@ssw0rd 
/SQLCOLLATION=Latin1_General_CI_AS



After changing the SQL Server default collation, all the existing databases will be detached. Therefore, open up SQL Server Management Studio, reattach all the databases which you had restored just now and all the missing database.



Now, go back to the TFS Upgrade Wizard window, click the link button to rerun the readiness check. You should see all the green sign now. My SharePoint Integration is a warning sign is because I have not install and configure SharePoint yet. You still can proceed to click the "Configure" button then install the Sharepoint Foundation later.



Now, wait.....



Click the Next button, then wait again.....



Job done! It is the sign for you to open up Visual Studio 2012, use the Team Explorer to connect to your new TFS2012. View the code, and view the change history, ensure everything still in tact and correct.

But, I still miss the SharePoint integration. It seem like it is optional to have it. If you want it, you still can install it now by going back to the TFS setup wizard. Select the "Configure Extensions for SharePoint Products". If you already had installed SharePoint on your server, you still start the same wizard to configure it. Otherwise, you still can use TFS without SharePoint.



Then, install SharePoint Foundation 2010.




Monday, November 26, 2012

Aync and Await in .net 4.5

Today I want to talk about a very interesting topic base on the an article from MSDN about new the syntax async and await in .net 4.5.

What is asynchronous?
I have a funny example in mind to explain this. Imagine how people communicate. When you talk to a person, you speak and then normally would wait for a response. After getting a response, you continue to speak. This is call synchronous. But for asynchronous, it is suppose to mean not synchronous. Therefore, imagine you speak to a person, you do not care for a response but instead keep talking and talking. The situation is just like your wife is nagging at you and no matter what you talk or respond, she ignore.

The asynchronous programming is a way to perform one operation to another without the need to wait for one operation to complete and respond first. While the asynchronous operation has been fired, you can either forget the respond which also known as "fire and forget", or wait for the respond later. In .net 4.5, you can easily code it with the new syntax async and await.

Async:
The async syntax is used on the method declaration only. It is used to flag that this method support asynchronous operation.
For example:
public async void Method()
{

}

When a method is declared as async, you have to use the await syntax within the method, otherwise it will become synchronous.

Another few more method declaration example:
public async Task Method() - Indicating this async method return nothing
public async Task<TMethod() - Indicating this async method return the specified type

Note:
public async void Method() VS public async Task Method()
Both method declaration above are not the same. Void return method is normally used in event handler. You use it only when you want to create a "fire and forget" method. Void return method is not awaitable.
Task return method is an async method that return nothing. You should use it to create awaitable void method.


Await:
The await syntax is used on the method which has the return type is Task<T> or awaitable.
For example:
int value = await Increment(random.Next(100));

Here is how my awaitable method declaration:
private async Task<int> Increment(int value)
{

}

You should expect to see the (awaitable) word in the tooltip after mouse over to your method.



Every async method must use await syntax at least once within it. Otherwise, you will get the following warning:

"This async method lacks 'await' operators and will run synchronously."

A usual async method with Task return type implementation should look like this:

public async Task Method()
{
    await Task.Run(() =>
    {
        Thread.Sleep(10000);
        Debug.WriteLine("This is an async method with Task return type.");
    });

}

A normal async method with Task specified type return should look like this:

public async Task<string> Method()
{
    string value = "This is an async method with Task<string> return type.";

    return await Task.FromResult<string>(value);

}

I have created a sample WPF window application to test with the new syntaxes.
The sample idea is to perform 4 value increment tasks with random value first before proceed to wait for result and display to the UI.

I have put comments into my code below to show how asynchronous work.

Random random = new Random();
for (int i = 0; i < 1000; i++)
{
    //Task 1 has begin
    Task<int> getIncrementTask1 = Increment(random.Next(100));

    //Task 2 has begin while Task 1 is still working
    Task<int> getIncrementTask2 = Increment(random.Next(100));

    //Task 3 has begin while Task 2 is still working but Task 1 is completed
    Task<int> getIncrementTask3 = Increment(random.Next(100));

    //Task 4 has begin while Task 3 is still working but Task 1 and Task 2 are completed
    Task<int> getIncrementTask4 = Increment(random.Next(100));

    //Await for Task to complete, then get its value
    int incrementValue1 = await getIncrementTask1;
    int incrementValue2 = await getIncrementTask2;
    int incrementValue3 = await getIncrementTask3;
    int incrementValue4 = await getIncrementTask4;

    //Use dispatcher to update the UI to make it responsive
    Dispatcher.Invoke(() =>
    {
        MessageText.Text += "\n Increment value: " + incrementValue1.ToString();
        MessageText.Text += "\n Increment value: " + incrementValue2.ToString();
        MessageText.Text += "\n Increment value: " + incrementValue3.ToString();
        MessageText.Text += "\n Increment value: " + incrementValue4.ToString();
    });

    Task<long> getExponentialTask1 = Exponential(incrementValue1);
    Task<long> getExponentialTask2 = Exponential(incrementValue2);
    Task<long> getExponentialTask3 = Exponential(incrementValue3);
    Task<long> getExponentialTask4 = Exponential(incrementValue4);

    //Dispatcher invoke the UI update asynchrounously
    //Update the UI first if get exponential task 1 is completed first
    DispatcherOperation op1 = Dispatcher.InvokeAsync(async () =>
    {
        MessageText.Text += "\n Exponential value: " + (await getExponentialTask1).ToString);
    });

    DispatcherOperation op2 = Dispatcher.InvokeAsync(async () =>
    {
        MessageText.Text += "\n Exponential value: " + (await getExponentialTask2).ToString);
    });

    DispatcherOperation op3 = Dispatcher.InvokeAsync(async () =>
    {
        MessageText.Text += "\n Exponential value: " + (await getExponentialTask3).ToString);
   });

    DispatcherOperation op4 = Dispatcher.InvokeAsync(async () =>
    {
        MessageText.Text += "\n Exponential value: " + (await getExponentialTask4).ToString);
    });

    await op1;
    await op2;
    await op3;
    await op4;

    //Scroll the text box to the end after all operations are completed
    Dispatcher.Invoke(() =>
    {
        MessageText.ScrollToEnd();
    });
}

Here are my awaitable methods:

private async Task<int> Increment(int value)
{
    //Slow the time down to see the async work
    //Thread.Sleep(1000);
    return await Task.FromResult<int>(value + 1);
}

private async Task<long> Exponential(int value)
{
    //Slow the time down to see the async work
    //Thread.Sleep(1000);
    return await Task.FromResult<long>(value * value);
}


Error handling:
What if there is an error occurred during the async call?
No worry, the Task<T> object has IsFaulted property and Exception property. If there is any unhandled exception was thrown, the IsFaulted property value will be flagged as true, and the exception detail will be available in the Exception property.

Here are the sample code to check if a Task is faulted or not.
When a Task is in faulted state, you cannot await it to get its result. It will throw you an exception.

if (getIncrementTask1.IsFaulted || getIncrementTask2.IsFaulted || getIncrementTask3.IsFaulted || getIncrementTask4.IsFaulted)
{
    Dispatcher.Invoke(() =>
    {
        MessageText.Text += "\n Error has occurred!";
    });
}
else
{
    //Await for Task 1 to complete if it not, then get its value
    incrementValue1 = await getIncrementTask1;
    incrementValue2 = await getIncrementTask2;
    incrementValue3 = await getIncrementTask3;
    incrementValue4 = await getIncrementTask4;
}


There are many APIs support async methods in .net 4.5. Below are the API list referred from MSDN.
As long as you see any API return type is Task<T>, they are awaitable and support async call.

Web accessHttpClient , SyndicationClient
Working with filesStorageFile, StreamWriter, StreamReader, XmlReader
Working with imagesMediaCapture, BitmapEncoder, BitmapDecoder
WCF programmingSynchronous and Asynchronous Operations
Working with socketsSocket


I commonly work on WCF, and I notice I can now generate task-based WCF operation with .net 4.5 and use it with the new await syntax.



Feel free to download my sample project if you like. DOWNLOAD




Send Transactional SMS with API

This post cover how to send transactional SMS using the Alibaba Cloud Short Message Service API. Transactional SMS usually come with One Tim...