Sunday, December 30, 2012

How to Export Large Data into File?

I have a very interesting topic to share, at least to me. I believe all developers would encounter a system requirement which the user wants to export data from the application into a file and then download it by just clicking a "Export" button. Assume you do for web base application, what would you normally do? The main challenge is the data size to export is huge and contain millions of records.

Well, commonly people would do something like this:


public List<Profile> ExportData()
{
    List<Profile> profileList = new List<Profile>();

    using (SqlConnection sqlConn = new SqlConnection(connectionString))
    using (SqlCommand sqlCmd = new SqlCommand("SELECT * FROM Profiles", sqlConn))
    using (SqlDataReader reader = sqlCmd.ExecuteReader())
    {
        while (reader.Read())
        {
            profileList.Add(new Profile()
            {
                ID = long.Parse(reader["ID"].ToString()),
                FirstName = reader["FirstName"].ToString(),
                LastName = reader["LastName"].ToString(),
                Email = reader["Email"].ToString(),
                ContactNo = reader["ContactNo"].ToString(),
                AboutMe = reader["AboutMe"].ToString(),
                Interest = reader["Interest"].ToString(),
                Hobbies = reader["Hobbies"].ToString()
            });
        }
        reader.Close();
    }
    return profileList;
}

We got to wait for quite some time for all the data being fetched. Most likely you would encounter session timeout error. If you go and extend the session timeout and manage to pull all the records, imagine million rows of record are stored in the server memory. And, you never know how many user is going to click the "Export" button at the same time.

And then, you would need to have the following code to write data into a file. And then, that file will be downloaded by the user.

public static void WriteToFile(string value)
{
    if (File.Exists(path))
    {
        File.Delete(path);
    }
           
    FileStream fileStream = File.Create(path);

    byte[] info = new UTF8Encoding(true).GetBytes(value);
    fileStream.Write(info, 0, info.Length);
}


Finally, you iterate the data and write them into a file.

public void TestMethod1()
{
    DataExportDAC dac = new DataExportDAC();
    List<Profile> profileList = dac.ExportData();

    foreach (Profile profile in profileList)
    {
        StringBuilder builder = new StringBuilder();
        builder.Append(
            profile.ID + "," +
            profile.FirstName + "," +
            profile.LastName + "," +
            profile.Email + "," +
            profile.ContactNo + "," +
            profile.AboutMe + "," +
            profile.Interest + "," +
            profile.Hobbies +
            Environment.NewLine);

        FileUtility.WriteToFile(builder.ToString());
    }
}


The above code looks fine and it actually works, but not efficient. When come to heavy load, it will be extremely slow. There are 2 bottlenecks here. The first one is the while loop for the SqlDataReader to get data row by row. The second one is the foreach loop to append data one by one into a file.

We can do a little enhancement by combining 2 loops into 1.

public void ExportDataIntoFile()
{
    using (SqlConnection sqlConn = new SqlConnection(connectionString))
    using (SqlCommand sqlCmd = new SqlCommand("SELECT * FROM Profiles", sqlConn))
    using (SqlDataReader reader = sqlCmd.ExecuteReader())
    {
        while (reader.Read())
        {
            StringBuilder builder = new StringBuilder();
            builder.Append(
                reader["ID"].ToString() + "," +
                reader["FirstName"].ToString() +"," +
                reader["LastName"].ToString() + "," +
                reader["Email"].ToString() + "," +
                reader["ContactNo"].ToString() + "," +
                reader["AboutMe"].ToString() + "," +
                reader["Interest"].ToString() + "," +
                reader["Hobbies"].ToString() +
                Environment.NewLine);

            FileUtility.WriteToFile(builder.ToString());
        }
        reader.Close();
    }
}


However, millions of hit at the foreach loop is not preventable. Even though you try with different approach such as using OLEDB, Excel COM, OpenXML or any other kind of file export component, you still need to load the data from DB first then insert them into the file export component record by record.

Sometime you would think that when we talk about data export, shouldn't it be done in the database server instead of web server? Since the data are sitting in the database server, why do we need to bring them forward to web server to do the processing then only save them into a file?

Therefore, you would start to create stored procedure or writing T-SQL which to be called by your application to do the data export in your database server.

INSERT INTO OPENROWSET('Microsoft.ACE.OLEDB.12.0',
    'Excel 12.0;Database=E:\testing.xlsx;',
    'SELECT * FROM [Sheet1$]')
SELECT * FROM Profiles


Alright, now you see a big difference. 

But, there is a better way to do this. There is a tool available in SQL Server 2005 or later call Bulk Copy Program (bcp.exe). This tool functionality is to import/export large data in/out from SQL Server.

The usage is simple.

bcp "SELECT * FROM QueryPerfTest.dbo.Profiles" queryout "E:\testing.csv" -S ".\MSSQL2008" -U sa -P p@ssw0rd -c -t ","



For the detail description of all the input parameters, refer to the MSDN site.

How long do you need to export a million rows of record and total data size is about 810MB?


I got it in 25 seconds.

Alright, credits to Serena. She is the one mentioned this awesome tool to me.




No comments:

Post a Comment

Send Transactional SMS with API

This post cover how to send transactional SMS using the Alibaba Cloud Short Message Service API. Transactional SMS usually come with One Tim...