Want to show your appreciation? Please to my charity.

Monday, April 13, 2009

ODP.Net Batch Update With Spring.Net

One of the many important missing pieces of ADO.Net is the inability to do the batch SQL. ADO.Net 2.0 patched in a cheesy batch update support for DataSet only so that when the number of changed rows exceeded certain limit, you get a database exception, not sure about SQL Server but this is at least true for both build in Oracle client and ODP.Net. Bear in mind that this is a pure Microsoft's design flaw by missing out this important feature in their API. It is NOT Oracle's problem. Oracle batch SQL has been working perfectly fine in JDBC world for many many years. In this post, I'm going to share with you a way to make it work with ODP.Net in one use case scenario.

Project Background

Our project uses Spring.Net framework. In the data access layer, we make mixed use of NHibernate and Spring ADO support. Thus all my code here are based on Spring.Net ADO, if you are not familiar with Spring.Net, you can at least read next section so that 1) there is information for about ODP.Net batch in general; and 2) you get to know some of many benefits that Spring.Net brings.

Inception

I have been watching code like below in our project again and again. It loops through a list of value objects and calls AdoTemplate one at a time. This yielded extremely slow performance. In one scenario, it inserted 717 rows in 7.9s.

Slow loop insert example
        public int SaveXyzDetail(IList<Xyz> xyzList)
        {
            string sql = RetrieveSQL("SaveXyzDetail");
            foreach (Xyz xyz in xyzList)
            {
                IDbParameters param = CreateDbParameters();

                param.Add("pxyzId", DbType.Int32).Value = xyz.XyzId;
                ......
                return AdoTemplate.ExecuteNonQuery(CommandType.Text, sql, param);
            }
        }

 

Obviously, we need to batch those SQL to database for optimal performance. But after looking around the Spring.Net's API as well as ADO.Net's API, to my surprise, there is pretty much nothing out there to help in this situation.

Eventually, I started to look into the extensions that ODP.Net provides. Thanks to Oracle, ODP.Net does provide two ways to execute batch SQL with limitations. One way is to bind the .Net array to PL/SQL associative arrays in stored procedure or PL/SQL block. You execute the stored procedure once to have all the values in the array send to database. Another way is to bind .Net arrays to regular SQL command parameters, you still execute the command once, but ODP.Net automatically create a prepared statement, loop through the arrays, execute the prepared statement multiple times but sending them all in a batch. Both has limitation that you cannot mix different SQL commands in one batch. But it is still a perfect solution to my problem. In the example above, we do execute the exactly same SQL many times in the for loop.

Extension Method to AdoTemplate

While the solution is promising, certainly I don't want to see the ODP.Net proprietary extension and type casting code all over the data access layer. The best is to have Spring.Net API to support this but unfortunately I cannot wait for it to come true and there doesn't seem to be any plan for that. NHibernate has batch support to SQL Server database by hacking the ADO.Net API, but support for Oracle database is still missing. May be they can consider to use of ODP.Net feature I mentioned above.

This is where the extension method come to handy. Hey, we can extend the AdoTemplate or may be better IAdoOperations. Let's extends it with the method below.

Extension Method Signature

1:   public static int ExecuteNonQuery<T>(
2:     this IAdoOperations operation,
3:     CommandType cmdType,
4:     string cmdText,
5:     ICollection<T> data,
6:     Converter<T, IDbParameters> dataToParameters)

I'm not going to explain the first two parameters which are common to all the ExecuteNonQuery methods. The data is a collection of value object that we are going to us as batch SQL parameters. The Converter delegate takes one value object and translate it to Spring's IDbParameters object that can be used to execute the SQL command.

The implementation takes a generic approach by using a factory pattern, so that it can be used with any other database that can provide batch support. When the extension method is called with an implementation of IAdoOperations that also implements the IBatchExecutorFactory interface, it makes use of the factory to execute the SQL in batch, otherwise it falls back to non-batch by looping through the collection and execute the command one by one. The full class can be found here.

And here is the interfaces for IBatchExecutorFactory and IBatchExecutor.

IBatchExecutorFactory

1:   public interface IBatchExecutorFactory
2: 
{
3:  
/// <summary>
4: 
/// Get an instance of <see cref="IBatchExecutor"/>.
5: 
/// </summary>
6: 
/// <returns>An instance of <see cref="IBatchExecutor"/>.</returns>
7: 
IBatchExecutor GetExecutor();
8:   }

 

IBatchExecutor

1:   public interface IBatchExecutor
2: 
{
3:  
/// <summary>
4: 
/// Executes batch of non queries with common command and different
5: 
/// parameters.
6: 
/// </summary>
7: 
/// <typeparam name="T">
8: 
/// The type of the data object.
9: 
/// </typeparam>
10: 
/// <param name="operation">
11: 
/// An <see cref="Spring.Data.IAdoOperations"/> object to perform
12: 
/// database updates.
13: 
/// </param>
14: 
/// <param name="cmdType">
15: 
/// The type of command.
16: 
/// </param>
17: 
/// <param name="cmdText">
18: 
/// The text of command.
19: 
/// </param>
20: 
/// <param name="data">
21: 
/// A collection of data object to be updated in batch.
22: 
/// </param>
23: 
/// <param name="dataToParameters">
24: 
/// Delegate that converts data object to parameters.
25: 
/// </param>
26: 
/// <returns>
27: 
/// The total updated count if 0 or positive. When -1 is returned,
28: 
/// it indicates that the update count cannot be obtained due the
29: 
/// the limitation of the batch implementation.
30: 
/// </returns>
31: 
int ExecuteNonQuery<T>(
32:  
IAdoOperations operation,
33:   System.Data.
CommandType cmdType,
34:  
string cmdText,
35:   System.Collections.Generic.
ICollection<T> data,
36:   System.
Converter<T, Common.IDbParameters> dataToParameters);

 

By now, we have extended the Spring.Net framework to support the use of batch SQL. What's left to do is to provide an implementation of IAdoOperations that uses ODP.Net batch SQL feature.

The Batch Executor for ODP.Net

To simplify the task, I decided to let the OracleOdpTemplate inherit from AdoTemplate. The class itself is straightforward, it provides a property for user to set the batch size and defaulted to 100. And it implements the IBatchExecutorFactory interface so that the extension method can detect it and obtain the batch executor from it.

The real worker is the inner class OracleOdpTemplate.BatchExecutor. Basically, it loop through all the data in the collection, calls the converter delegate to get the binding parameters for each value object, accumulate the parameter values in corresponding arrays. When accumulated rows reached the batch size, it flushes them to database using ODP.Net array binding. See the code snippet below.

Code snippet of OracleOdpTemplate.BatchExecutor
            #region IBatchExecutor Members

            public int ExecuteNonQuery<T>(
                IAdoOperations operation,
                CommandType cmdType,
                string cmdText,
                ICollection<T> data,
                Converter<T, IDbParameters> dataToParamters)
            {
                int totalRows = data.Count;
                int batchSize = _odpTemplate.BatchSize;
                if (totalRows < batchSize) batchSize = totalRows;

                int count = 0, bindCount = 0, result = 0;
                object[][] valueBuffer = null;
                
                foreach (T row in data)
                {
                    IDbParameters parameters = dataToParamters(row);
                    if (parameters != null)
                    {
                        if (valueBuffer == null)
                        {
                            valueBuffer = InitBatchParameters(parameters, batchSize);
                        }

                        string error = ValidateAndCopyParams(parameters, valueBuffer, bindCount++);
                        if (error != null)
                        {
                            throw new InvalidDataAccessApiUsageException(error + " for row: " + row);

                        }
                    }
                    ++count;
                    if (bindCount == batchSize || (count == totalRows) && bindCount > 0)
                    {
                        _bindCount = bindCount;
                        result += operation.ExecuteNonQuery(cmdType, cmdText, this);
                        bindCount = 0;
                    }

                }
                return result;
            }

            #endregion

 

Putting Things Together

Everything is ready so let's rewrite the example code we had in the beginning of the this post by using the extension method. We create a converter that sets the parameter values for each data object we need to insert then call ExecuteNonQuery extension method once with the list and converter.

Fast batch insert example
        public int SaveXyzDetail(IList<Xyz> xyzList)
        {
            string sql = RetrieveSQL("SaveXyzDetail");
            IDbParameters param = CreateDbParameters();

            var paramXyzId = param.Add("pxyzId", DbType.Int32);
            ......

            Converter<Xyz, IDbParameters> converter = delegate(Xyz xyz)
            {
                paramXyzId.Value = xyz.XyzId;
                ......
                return param;
            };
            return AdoTemplate.ExecuteNonQuery(CommandType.Text, sql, xyzList, converter);
        }

 

In addition to this, we also need to tell Spring.Net to inject the batch capable version of AdoTemplate. So we replace the AdoTemplate in the Spring.Net's configuration file with OracleOdpTemplate.

Modified Spring.Net configuration file
  <db:provider id="DbProvider" provider="OracleODP-2.0" connectionString="${DB.ConnectionString}"/>

  <object id="AdoTemplate" type="Spring.Data.Generic.OracleOdpTemplate, Spring.Extension">
    <property name="DbProvider" ref="DbProvider"/>
    <property name="BatchSize" value="${DB.MaxBatchSize}"/> <!-- number of rows -->
  </object>

  <object id="AdoXyzDao" type="Example.Dao.AdoXyzDao, Example.Dao">
        <property name="AdoTemplate" ref="AdoTemplate"/>
  </object>

 

Now run the test again, the same amount of rows are inserted in 0.7s with batch size of 100. That's ten times faster!

Source Code and Binary

You can find both the specific revision of source code I used in this post and binary in Google Code project. You can also get the latest source code of the SpringExtension project that this piece of function belongs to.

Updates

(4/17/2009) I have been struggling to write the unit test for this. It turned out the Spring.Net's AdoTemplate implementation doesn't use virtual. This makes extending the framework a little difficult, especially for writing unit tests.

No comments:

Post a Comment