Want to show your appreciation?
Please a cup of tea.

Tuesday, January 19, 2010

Always Override Equals of Your Value Type

ReSharper had been reminding me to override the Equals method and operator for value types that I have created. Every time I took the advice and let ReSharper do the work for me.

But I kept wondering how bad is the default implementation? The default implementation of Equals method is in the ValueType abstract class that every .Net value type inherits from. Initially I though it must use some sort of native memory comparison so it got to be fast. But after I looked at the code in Reflector, I’m no longer sure how true is this. It calls a native method CanCompareBits to check if current instance eligible for FastEqualsCheck. If yes great, otherwise it falls back to reflection. I don’t know what qualifies for fast equals check, but I start to realized that members with Equals method overridden cannot not use bit comparison. The reflection code told me that it uses Equals to compare members.

Enough talk, lets test the performance with two simplest value types, one has Equals overridden and another relays on the default implementation.

I used NUnit test case for simplicity but the code below can be easily changed to a console application.

   1:  private const int _sampleSize = 100;
   2:   
   3:  [Test] public static void Main()
   4:  {
   5:      var defaultEqualsStruct = new DefaultEqualsStruct[_sampleSize];
   6:      var overriddenEqualsStruct = new OverriddenEqualsStruct[_sampleSize];
   7:      for (int i = 0; i < _sampleSize; i++)
   8:      {
   9:          defaultEqualsStruct[i] = new DefaultEqualsStruct { Foo = i };
  10:          overriddenEqualsStruct[i] = new OverriddenEqualsStruct { Foo = i };
  11:      }
  12:      RunBenchMark(defaultEqualsStruct, new DefaultEqualsStruct { Foo = _sampleSize });
  13:      RunBenchMark(overriddenEqualsStruct, new OverriddenEqualsStruct { Foo = _sampleSize });
  14:  }
  15:   
  16:  public static void RunBenchMark<T>(T[] items, T e)
  17:  {
  18:      Console.WriteLine("==========" + typeof(T).Name + "=========");
  19:      // Warn up
  20:      Assert.IsTrue(ArrayIndexOf(items, items[5]));
  21:      Assert.IsTrue(LoopEquals(items, items[5]));
  22:      Clock(null, 1, () => Assert.IsFalse(ArrayIndexOf(items, e)));
  23:      Clock(null, 1, () => Assert.IsFalse(LoopEquals(items, e)));
  24:      // Run benchmark.
  25:      const int repeat = 100000;
  26:      Clock("Array.IndexOf", repeat, () => ArrayIndexOf(items, e));
  27:      Clock("Loop + Equals", repeat, () => LoopEquals(items, e));
  28:  }
  29:   
  30:  private static void Clock(string message, int repeat, Action a)
  31:  {
  32:      Stopwatch stopwatch = new Stopwatch();
  33:      stopwatch.Start();
  34:      for (int i = 0; i < repeat; i++) a();
  35:      stopwatch.Stop();
  36:      if (message != null)
  37:          Console.WriteLine(message + ": " + stopwatch.ElapsedTicks);
  38:  }
  39:   
  40:  private static bool ArrayIndexOf<T>(T[] items, T element)
  41:  {
  42:      return (Array.IndexOf(items, element) >= 0);
  43:  }
  44:   
  45:  private static bool LoopEquals<T>(T[] items, T element)
  46:  {
  47:      int size = items.Length;
  48:      for (int i = 0; i < size; i++)
  49:      {
  50:          if (Equals(element, items[i])) return true; // found
  51:      }
  52:      return false;
  53:  }
  54:   
  55:  public struct DefaultEqualsStruct
  56:  {
  57:      public int Foo { get; set; }
  58:  }
  59:   
  60:  public struct OverriddenEqualsStruct
  61:  {
  62:      public int Foo { get; set; }
  63:   
  64:      public bool Equals(OverriddenEqualsStruct other)
  65:      {
  66:          return other.Foo == Foo;
  67:      }
  68:   
  69:      public override bool Equals(object obj)
  70:      {
  71:          if (ReferenceEquals(null, obj)) return false;
  72:          if (obj.GetType() != typeof(OverriddenEqualsStruct)) return false;
  73:          return Equals((OverriddenEqualsStruct)obj);
  74:      }
  75:   
  76:      public override int GetHashCode()
  77:      {
  78:          return Foo;
  79:      }
  80:  }

And here goes the test result in release build:

   1:  ==========DefaultEqualsStruct=========
   2:  Array.IndexOf: 2863736
   3:  Loop + Equals: 1424624
   4:  ==========OverriddenEqualsStruct=========
   5:  Array.IndexOf: 289861
   6:  Loop + Equals: 509123

That’s really a lot of difference! Thanks ReSharper!

Monday, January 18, 2010

How Fast Are Methods In .Net Array Class

It’s very fast.

When I was porting the CopyOnWriteArrayList class from Java to .Net, I came across the implementation of the AddIfAbsent method. In Java it was written as:

   1:  public boolean addIfAbsent(E e) {
   2:      final ReentrantLock lock = this.lock;
   3:      lock.lock();
   4:      try {
   5:          // Copy while checking if already present.
   6:          // This wins in the most common case where it is not present
   7:          Object[] elements = getArray();
   8:          int len = elements.length;
   9:          Object[] newElements = new Object[len + 1];
  10:          for (int i = 0; i < len; ++i) {
  11:          if (eq(e, elements[i]))
  12:              return false; // exit, throwing away copy
  13:          else
  14:              newElements[i] = elements[i];
  15:          }
  16:          newElements[len] = e;
  17:          setArray(newElements);
  18:          return true;
  19:      } finally {
  20:          lock.unlock();
  21:      }
  22:  }

If you read the comment in the code above, the implementation creates a new array regardless of whether the element exists or not. If it finds out that the element exists during the copying of existing elements between arrays, it simply discards the newly created array.

Comparing this to the typical approach that search first then create new array if not found, it wins in the case when the element doesn’t exist by eliminating the search.

If I directly translate the same code into .Net, I won’t be able use any of utility methods provided by Array class in .Net framework. So I have to choose between a) sticking with the Java’s implementation, or  b) using Array.IndexOf then Array.Copy if found.

Let’s set out and run a benchmark (I’m using NUnit test case but the code can be easily called by a Program.cs file).

Let’s only test the use case where the element is not found, to see if the Array.IndexOf+Array.Copy implementation (let’s call it “Find Then Copy”) can compete with find when copying implementation (“Copy With Find”) of Java.

   1:      [TestFixture(typeof(int))]
   2:      [TestFixture(typeof(string))]
   3:      public class CopyOnWriteListTest<T>
   4:      {
   5:          private static readonly T[] _items = TestData<T>.MakeTestArray(100);
   6:   
   7:          [Test] public void RunBenchMark()
   8:          {
   9:              Console.WriteLine("==========" + typeof (T).Name + "=========");
  10:   
  11:              // Confirm that Array.IndexOf use value equals, not reference equals.
  12:              T value = TestData<T>.MakeData(5);
  13:              int index = Array.IndexOf(_items, value);
  14:              Assert.That(index, Is.EqualTo(5));
  15:              Assert.That(value, Is.Not.SameAs(_items[index]));
  16:              Assert.That(value, Is.EqualTo(_items[index]));
  17:   
  18:              T e = TestData<T>.MakeData(_items.Length);
  19:              // Warn up
  20:              Clock(null, 1, () => FindThenCopy(e));
  21:              Clock(null, 1, () => CopyWithFind(e));
  22:              // Run benchmark.
  23:              const int repeat = 100000;
  24:              Clock("Find Then Copy", repeat, () => FindThenCopy(e));
  25:              Clock("Copy With Find", repeat, () => CopyWithFind(e));
  26:          }
  27:   
  28:          private static void Clock(string message, int repeat, Action a)
  29:          {
  30:              Stopwatch stopwatch = new Stopwatch();
  31:              stopwatch.Start();
  32:              for (int i = 0; i < repeat; i++) a();
  33:              stopwatch.Stop();
  34:              if (message != null)
  35:                  Console.WriteLine(message + ": " + stopwatch.ElapsedTicks);
  36:          }
  37:   
  38:          private static void FindThenCopy(T element)
  39:          {
  40:              T[] items = _items;
  41:              var index = Array.IndexOf(items, element);
  42:              if (index >= 0) return; // found
  43:              int size = items.Length;
  44:              var newItems = new T[size + 1];
  45:              Array.Copy(items, 0, newItems, 0, size);
  46:              newItems[size] = element;
  47:          }
  48:   
  49:          private static void CopyWithFind(T element)
  50:          {
  51:              T[] items = _items;
  52:              int size = items.Length;
  53:              var newItems = new T[size + 1];
  54:              for (int i = 0; i < size; i++)
  55:              {
  56:                  if (Equals(element, items[i])) return; // found
  57:                  newItems[i] = items[i];
  58:              }
  59:              newItems[size] = element;
  60:          }
  61:      }

And here is the result:

   1:  ==========Int32=========
   2:  Find Then Copy: 129702
   3:  Copy With Find: 547511
   4:  ==========String=========
   5:  Find Then Copy: 274798
   6:  Copy With Find: 375962

Although it is two pass approach using Array class, native code is still significantly faster than looping in managed code. Value type showing larger difference is because the boxing when calling the Equals method.

Result changes a little when the array size is getting smaller. When we run it with array size of 20:

   1:  ==========Int32=========
   2:  Find Then Copy: 82526
   3:  Copy With Find: 112597
   4:  ==========String=========
   5:  Find Then Copy: 107412
   6:  Copy With Find: 108321

And 10:

   1:  ==========Int32=========
   2:  Find Then Copy: 36767
   3:  Copy With Find: 84236
   4:  ==========String=========
   5:  Find Then Copy: 68216
   6:  Copy With Find: 57618

This is because the overhead of calling the method into Array class starts to play the game. The “Copy With Find” implementation gets faster for reference type when array size is below 20. But both implementations are fast when array is small anyway so I don’t see much benefit of using “Copy With Find” approach in .Net.

I have also tested with custom reference type and value types, the result is comparable to int and string test.

Conclusion: Array class won the race! Yes, array operations using methods in Array class are indeed very fast.