Want to show your appreciation? Please to my charity.

Monday, May 25, 2009

Random Thoughts About Java vs. C#/.Net

There is no intention to give an extensive comparison. I just want to log the things that I found worth recording during my two tests.

Monitor performance - winner: .Net

Please read those posts above. The winner is clearly .Net, Java was close to .Net in single thread test but getting worse and worse in the tests with more threads and more physical CPUs. We need a better JVM!

Closure or Delegate MS named it - winner: .Net

Suffered from the lacking of the Closure, I had to write an additional method getDescription and used two switch statement to achieve what I did with one delegate field in .Net. I know a lot of Java people don't like Closure because they want to keep the simplicity of Java. But come on, all junior C# developers are using Delegate happily. It is time to improve the language features so I can write more effectively in Java.

Concurrent library - winner: Java

I can easily use CyclicBarrier and CountDownLatch in Java to achieve thread synchronization. But I had to write my own Barrier and wait for thread to end instead of task complete in .Net. In the case of Java, the code can be easily changed to thread pool. But in order to do the same in .Net, I'll need to write my own count down latch.

BTW, at this moment, Java has way better libraries, both build in and 3rd party (including open source) then .Net from all aspects. Concurrent library is just an example. Most of things from MS P&P are inferior then the counterpart of Java and open source .Net world. Almost every MS library offering is invasive because they want to lock you in.

Synchronized keyword - winner: Java

I know there are a lot of debate on this topic. I agree that Java's synchronized keyword as method modifier exposes internal synchronization detail externally and other code can lock the same instance to interfere and causes dead lock. But:

  • For my internal/private scoped class, it is so much easier to write in Java comparing to .Net. Take a look at the same functional class Accumulator in Java and Accumulator in C# and tell for yourself. (You may have noticed that I missed the lock in Average property getter and had to fix in a later revision. Tedious!).
  • OK, locking on the object itself is bad, but why cannot C# automatically define a field and use that to all parameterless lock and have the synchronized (or locked if MS had to given everything a different name) back as a valid modifier for members?

IDE - winner: Java

Well, I'm talking about Eclipse. I know Visual Studio had improved a lot, but hey, in Eclipse, my local variables are black, and fields are blue and static fields are italic. In Visual Studio, all are black. The readability is horrible if I don't have a good naming convention. There are more and more to mention but I'll leave it for others to discover.

Generics - winner: None

See: How Good Is .Net Generics

Final thoughts

  • Java is getting further behind in the VM performance and language feature. As C# continues to improve from version to version, Java really need to catch up.
  • .Net fall far behind from the library perspective. The design of build-in libraries are nowhere close to Java's offering. With Java, I can quickly assemble an enterprise application from readily available high quality libraries so I can just focus on business logic. But from project to project on C#.Net, I had to write a lot of non business code and from time to time workaround bad implementations inside the .Net build-in classes.

Time will tell who will win the race eventually.

AtomicInteger vs. Synchronized Monitor

Update (2/7/2010): I noticed a lot of Google traffic landing this page looking for .Net version of AtomicInteger. This is now available as part of the Java’s concurrent API port to .Net, you can download a Bate version here.

When porting the Java's AtomicInteger to .Net, I did a performance test to compare between the implementation using Interlocked and Monitor. The surprised finding inspired me to do a similar test in Java to see how does Java performs.

Let's start with below interface.

public interface AtomicTest {
	public int get();
	public void set(int value);
	public boolean compareAndSet(int expected, int newValue);
	public int incrementAndGet();
	public int decrementAndGet();
}

And we are going to have two implementations, one users AtomicInterger:

import java.util.concurrent.atomic.AtomicInteger;

public class AtomicIntegerTest extends AtomicInteger implements AtomicTest {}

Another one uses synchronized access to a volatile field. Except that the read access is not synchronized.

public class MonitorAtomicTest implements AtomicTest {
	private volatile int _value;
	public synchronized boolean compareAndSet(int expected, int newValue) {
		if (expected==_value) {
			_value = newValue;
			return true;
		}
		return false;
	}

	public synchronized int decrementAndGet() {
		return --_value;
	}

	public int get() {
		return _value;
	}

	public synchronized int incrementAndGet() {
		return ++_value;
	}

	public synchronized void set(int value) {
		_value = value;
	}
}

Similar to the .Net test, we run below methods for each implementation with loop set to one million.

    private void runCompareAndSet() {
        int result1, result2, result3;
        for (int i = loop - 1; i >= 0; i--) {
            atomic.compareAndSet(100, 50);
            result1 = atomic.get();
            atomic.compareAndSet(50, 100);
            result2 = atomic.get();
            atomic.compareAndSet(100, 50);
            result3 = atomic.get();
        }
    }

    private void runIncrement() {
        int result1, result2, result3;
        for (int i = loop - 1; i >= 0; i--) {
            atomic.incrementAndGet();
            result1 = atomic.get();
            atomic.incrementAndGet();
            result2 = atomic.get();
            atomic.incrementAndGet();
            result3 = atomic.get();
        }
    }

Finally, we run each method above in multiple threads in parallel. Amount of thread can be passed as command line parameter. Below is the main method, for the detail of how things get done, the full source code is available here.

    public static void main(String[] args)
        throws InterruptedException 
    {
        if (args.length > 0) threadCount = Integer.parseInt(args[0]);
        verbose = "true".equalsIgnoreCase(System.getProperty("verbose"));

        TestRunner a = new TestRunner(new AtomicIntegerTest());
        TestRunner b = new TestRunner(new MonitorAtomicTest());

        a.runCompareAndSetInParallel();
        b.runCompareAndSetInParallel();
        a.runIncrementInParallel();
        b.runIncrementInParallel();
    }

Ran the test on three Windows boxes with lated JRE which is 1.6.0_13. Detail of test hardware and OS can be found in my previous post about similar test for .Net.

Below I listed the test result:

Test result on a two(2) CPU box with client VM

D:\>java -jar AtomicIntegerVsMonitor.jar 1 
  AtomicIntegerTest.runCompareAndSet (ns):   109 Average,   109 Minimal,   109 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   219 Average,   219 Minimal,   219 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   125 Average,   125 Minimal,   125 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   250 Average,   250 Minimal,   250 Maxmial,  1 Threads
	
D:\>java -jar AtomicIntegerVsMonitor.jar 1 
  AtomicIntegerTest.runCompareAndSet (ns):   109 Average,   109 Minimal,   109 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   219 Average,   219 Minimal,   219 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   125 Average,   125 Minimal,   125 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   265 Average,   265 Minimal,   265 Maxmial,  1 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2 
  AtomicIntegerTest.runCompareAndSet (ns):   211 Average,   203 Minimal,   219 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  1359 Average,  1297 Minimal,  1422 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):   312 Average,   312 Minimal,   312 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  1461 Average,  1453 Minimal,  1469 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2 
  AtomicIntegerTest.runCompareAndSet (ns):   203 Average,   203 Minimal,   203 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  1390 Average,  1375 Minimal,  1406 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):   313 Average,   313 Minimal,   313 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  1359 Average,  1359 Minimal,  1359 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 4 
  AtomicIntegerTest.runCompareAndSet (ns):   449 Average,   406 Minimal,   485 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  2590 Average,  2469 Minimal,  2641 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):   582 Average,   562 Minimal,   625 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns):  2624 Average,  2406 Minimal,  2765 Maxmial,  4 Threads
	
D:\>java -jar AtomicIntegerVsMonitor.jar 4 
  AtomicIntegerTest.runCompareAndSet (ns):   300 Average,   235 Minimal,   406 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  2649 Average,  2532 Minimal,  2797 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):   602 Average,   563 Minimal,   641 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns):  2871 Average,  2766 Minimal,  2953 Maxmial,  4 Threads
	
D:\>java -jar AtomicIntegerVsMonitor.jar 16 
  AtomicIntegerTest.runCompareAndSet (ns):  1305 Average,   906 Minimal,  1703 Maxmial, 16 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  9507 Average,  5344 Minimal, 10610 Maxmial, 16 Threads
  AtomicIntegerTest.runIncrement     (ns):  2064 Average,  1516 Minimal,  2516 Maxmial, 16 Threads
  MonitorAtomicTest.runIncrement     (ns):  9944 Average,  8312 Minimal, 10703 Maxmial, 16 Threads
	
D:\>java -jar AtomicIntegerVsMonitor.jar 16 
  AtomicIntegerTest.runCompareAndSet (ns):  1309 Average,   984 Minimal,  1625 Maxmial, 16 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 10958 Average,  8485 Minimal, 12188 Maxmial, 16 Threads
  AtomicIntegerTest.runIncrement     (ns):  2093 Average,  1188 Minimal,  2515 Maxmial, 16 Threads
  MonitorAtomicTest.runIncrement     (ns): 12046 Average, 11188 Minimal, 13110 Maxmial, 16 Threads

Test result on a four(4) CPU box with client VM

D:\>java -jar AtomicIntegerVsMonitor.jar 1
  AtomicIntegerTest.runCompareAndSet (ns):   203 Average,   203 Minimal,   203 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   516 Average,   516 Minimal,   516 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   218 Average,   218 Minimal,   218 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   563 Average,   563 Minimal,   563 Maxmial,  1 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 1
  AtomicIntegerTest.runCompareAndSet (ns):   219 Average,   219 Minimal,   219 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   531 Average,   531 Minimal,   531 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   219 Average,   219 Minimal,   219 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   578 Average,   578 Minimal,   578 Maxmial,  1 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2
  AtomicIntegerTest.runCompareAndSet (ns):  1047 Average,  1047 Minimal,  1047 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  4930 Average,  4922 Minimal,  4938 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):   929 Average,   922 Minimal,   937 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  4891 Average,  4891 Minimal,  4891 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2
  AtomicIntegerTest.runCompareAndSet (ns):   890 Average,   890 Minimal,   890 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  3922 Average,  3922 Minimal,  3922 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):   961 Average,   953 Minimal,   969 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  4570 Average,  4562 Minimal,  4578 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 4
  AtomicIntegerTest.runCompareAndSet (ns):  2558 Average,  2484 Minimal,  2594 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 15445 Average, 15375 Minimal, 15547 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):  5359 Average,  5250 Minimal,  5406 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns): 15269 Average, 15141 Minimal, 15344 Maxmial,  4 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 4
  AtomicIntegerTest.runCompareAndSet (ns):  2573 Average,  2515 Minimal,  2593 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 15363 Average, 14969 Minimal, 15516 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):  5296 Average,  5250 Minimal,  5328 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns): 15823 Average, 15734 Minimal, 15890 Maxmial,  4 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 16
  AtomicIntegerTest.runCompareAndSet (ns):  8722 Average,  6859 Minimal, 10047 Maxmial, 16 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 60182 Average, 57750 Minimal, 63984 Maxmial, 16 Threads
  AtomicIntegerTest.runIncrement     (ns): 18828 Average, 12032 Minimal, 20672 Maxmial, 16 Threads
  MonitorAtomicTest.runIncrement     (ns): 59987 Average, 58516 Minimal, 60594 Maxmial, 16 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 16
  AtomicIntegerTest.runCompareAndSet (ns):  8216 Average,  4828 Minimal, 10219 Maxmial, 16 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 68320 Average, 67188 Minimal, 68719 Maxmial, 16 Threads
  AtomicIntegerTest.runIncrement     (ns): 18796 Average, 14719 Minimal, 21469 Maxmial, 16 Threads
  MonitorAtomicTest.runIncrement     (ns): 67786 Average, 66844 Minimal, 68281 Maxmial, 16 Threads

Test result on a sixteen(16) CPU box with server VM

D:\>java -jar AtomicIntegerVsMonitor.jar 1
  AtomicIntegerTest.runCompareAndSet (ns):    94 Average,    94 Minimal,    94 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   250 Average,   250 Minimal,   250 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   109 Average,   109 Minimal,   109 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   250 Average,   250 Minimal,   250 Maxmial,  1 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 1
  AtomicIntegerTest.runCompareAndSet (ns):    94 Average,    94 Minimal,    94 Maxmial,  1 Threads
  MonitorAtomicTest.runCompareAndSet (ns):   250 Average,   250 Minimal,   250 Maxmial,  1 Threads
  AtomicIntegerTest.runIncrement     (ns):   109 Average,   109 Minimal,   109 Maxmial,  1 Threads
  MonitorAtomicTest.runIncrement     (ns):   234 Average,   234 Minimal,   234 Maxmial,  1 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2
  AtomicIntegerTest.runCompareAndSet (ns):   875 Average,   875 Minimal,   875 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  2461 Average,  2453 Minimal,  2469 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):  1063 Average,  1063 Minimal,  1063 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  2382 Average,  2375 Minimal,  2390 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 2
  AtomicIntegerTest.runCompareAndSet (ns):  1829 Average,  1829 Minimal,  1829 Maxmial,  2 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  2828 Average,  2828 Minimal,  2828 Maxmial,  2 Threads
  AtomicIntegerTest.runIncrement     (ns):   422 Average,   391 Minimal,   453 Maxmial,  2 Threads
  MonitorAtomicTest.runIncrement     (ns):  3258 Average,  3250 Minimal,  3266 Maxmial,  2 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 4
  AtomicIntegerTest.runCompareAndSet (ns):  3250 Average,  3250 Minimal,  3250 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  6097 Average,  5953 Minimal,  6156 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):  2531 Average,  2531 Minimal,  2531 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns):  5808 Average,  5766 Minimal,  5844 Maxmial,  4 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 4
  AtomicIntegerTest.runCompareAndSet (ns):  3000 Average,  3000 Minimal,  3000 Maxmial,  4 Threads
  MonitorAtomicTest.runCompareAndSet (ns):  6097 Average,  5922 Minimal,  6187 Maxmial,  4 Threads
  AtomicIntegerTest.runIncrement     (ns):  2547 Average,  2547 Minimal,  2547 Maxmial,  4 Threads
  MonitorAtomicTest.runIncrement     (ns):  6039 Average,  5969 Minimal,  6094 Maxmial,  4 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 16
  AtomicIntegerTest.runCompareAndSet (ns):  9749 Average,  9749 Minimal,  9749 Maxmial,  16 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 21486 Average, 21078 Minimal, 21641 Maxmial,  16 Threads
  AtomicIntegerTest.runIncrement     (ns): 46216 Average, 45562 Minimal, 46827 Maxmial,  16 Threads
  MonitorAtomicTest.runIncrement     (ns): 24795 Average, 23422 Minimal, 25281 Maxmial,  16 Threads

D:\>java -jar AtomicIntegerVsMonitor.jar 16
  AtomicIntegerTest.runCompareAndSet (ns):  9787 Average,  9781 Minimal,  9797 Maxmial,  16 Threads
  MonitorAtomicTest.runCompareAndSet (ns): 23269 Average, 23109 Minimal, 23390 Maxmial,  16 Threads
  AtomicIntegerTest.runIncrement     (ns): 45608 Average, 45047 Minimal, 46141 Maxmial,  16 Threads
  MonitorAtomicTest.runIncrement     (ns): 19401 Average, 19156 Minimal, 19578 Maxmial,  16 Threads

This test is different from the .Net test. It is quite conclusive:

  • AtomicInteger performs consistently better then Monitor except one test case when running Increment with 16 threads on 16 CUP box.
  • Java's AtomicInteger.compareAndSet performs in peer with .Net's Interlocked.CompareExchange.
  • Java's Monitor is consistently slower then .Net's and getting worse when there are more threads and more collisions.
  • AtomicInteger also suffers from heavy collision as Interlocked does. One extreme case is the Increment test with 16 threads on 16 CPU box. The implementation of AtomicInteger.incrementAndGet uses a looped try of AtomicInteger.compareAndSet, that is how the collision increased. The result of Interlocked.Increment is much better so I believe a different strategy is in use.

Sunday, May 24, 2009

Interlocked vs. Monitor Performance

In .Net, when I need thread safe access to a field, I can either use Monitor or use Interlocked class. I have been in believe that the Interlocked should work faster, otherwise I don't see the value of using it except for inter-process atomic. Today when I was porting the Java's AtomicInteger to .Net, I was wondering how should I implement this. I again faced the choice between Interlocked and Monitor. So I decided to do a test.

The test goes against three different implementations of below interface:

    internal interface IAtomic
    {
        int Value { get; set; }
        int CompareExchange(int newValue, int expected);
        int Exchange(int newValue);
    }

One implementation uses method in the Interlocked static class to operate on an volatile field to provide thread safe access to the integer.

    internal class InterlockAtomic : IAtomic
    {
        private volatile int _value = 50;

        public int Value
        {
            get { return _value; }
            set { _value = value; }
        }

        public virtual int CompareExchange(int newValue, int expected)
        {
            return Interlocked.CompareExchange(ref _value, newValue, expected);
        }

        public virtual int Exchange(int newValue)
        {
            return Interlocked.Exchange(ref _value, newValue);
        }
    }

The 2nd implementation uses Monitor with C#'s lock keyword for all the access.

    internal class MonitorAtomic : IAtomic
    {

        private int _value = 50;

        public int Value
        {
            get { lock (this) return _value; }
            set { lock (this) _value = value; }
        }

        public virtual int CompareExchange(int newValue, int expected)
        {
            lock (this)
            {
                int orig = _value;
                if (expected == orig) _value = newValue;
                return orig;
            }
        }

        public virtual int Exchange(int newValue)
        {
            lock (this)
            {
                int orig = _value;
                _value = newValue;
                return orig;
            }
        }
    }

The 3rd implementation uses  Monitor for all access except the read access thread safety is provided by the volatile.

    internal class MonitorVolatileAtomic : IAtomic
    {

        private volatile int _value = 50;

        public int Value
        {
            get { return _value; }
            set { lock (this) _value = value; }
        }

        public virtual int CompareExchange(int newValue, int expected)
        {
            lock (this)
            {
                int orig = _value;
                if (expected == orig) _value = newValue;
                return orig;
            }
        }

        public virtual int Exchange(int newValue)
        {
            lock (this)
            {
                int orig = _value;
                _value = newValue;
                return orig;
            }
        }
    }

Then run the below methods for each implementation with loop set to one million.

        private void RunCompareExchange()
        {
            int result1, result2, result3;
            for (int i = loop - 1; i >= 0; i--)
            {
                _atomic.CompareExchange(50, 100);
                result1 = _atomic.Value;
                _atomic.CompareExchange(100, 50);
                result2 = _atomic.Value;
                _atomic.CompareExchange(50, 100);
                result3 = _atomic.Value;
            }
        }

        private void RunExchange()
        {
            int result1, result2, result3;
            for (int i = loop - 1; i >= 0; i--)
            {
                _atomic.Exchange(30);
                result1 = _atomic.Value;
                _atomic.Exchange(50);
                result2 = _atomic.Value;
                _atomic.Exchange(100);
                result3 = _atomic.Value;
            }
        }

Finally, we run each method above in multiple threads in parallel. Amount of thread can be passed as command line parameter. Below is the main method, for the detail of how things get done, the full source code is available here.

        static void Main(string[] args)
        {
            int threadCount = 1;
            if (args.Length > 0)
            {
                try
                {
                    threadCount = int.Parse(args[0]);
                }
                catch (Exception e)
                {
                    Console.Error.WriteLine(e.Message);
                }
            }
            Console.WriteLine("Using {0} threads:", threadCount);

            var a = new Program { _atomic = new InterlockAtomic() };
            var b = new Program { _atomic = new MonitorAtomic() };
            var c = new Program { _atomic = new MonitorVolatileAtomic() };
            RunAll(a.RunCompareExchange, threadCount);
            RunAll(b.RunCompareExchange, threadCount);
            RunAll(c.RunCompareExchange, threadCount);
            RunAll(a.RunExchange, threadCount);
            RunAll(b.RunExchange, threadCount);
            RunAll(c.RunExchange, threadCount);
        }

I made the Release build and tested on three different machines

  • Thinkpad laptop running Windows XP Professional SP2 with Core 2 Due T7300 2GHz CPU and 4GB dual channel DDR2. I have the systeminfo for 2 CPU box.
  • HP server (quite old) running Windows Server 2003 SE SP1 with 4 Xeon (Not sure but mostly 2x2) 2.8GHz CPUs and 3.5GB (don't know what memory). Here is the systeminfo for 4 CPU box.
  • HP server (very new) running Windows Server 2003 SE x64 SP2 with 16 Xeon (4x4) 3.4GHz CPUs and 16GB of dual channel DDR2 ECC. Here is the systeminfo for the 16 CPU box.

The test result is very interesting and I cannot explain the reason for this.

Here is the result of 2 CPU test.  In this test the Interlocked methods is clearly faster in all situations.

D:\>InterlockVsMonitor.exe 1
Using 1 threads:
          InterlockAtomic.RunCompareExchange   (ns):     81 Average,     81 Minimal,     81 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):    428 Average,    428 Minimal,    428 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    254 Average,    254 Minimal,    254 Maxmial
          InterlockAtomic.RunExchange          (ns):     81 Average,     81 Minimal,     81 Maxmial
            MonitorAtomic.RunExchange          (ns):    544 Average,    544 Minimal,    544 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    277 Average,    277 Minimal,    277 Maxmial

D:\>InterlockVsMonitor.exe 2
Using 2 threads:
          InterlockAtomic.RunCompareExchange   (ns):    180 Average,    176 Minimal,    184 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):    917 Average,    795 Minimal,   1039 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    507 Average,    472 Minimal,    543 Maxmial
          InterlockAtomic.RunExchange          (ns):    227 Average,    222 Minimal,    232 Maxmial
            MonitorAtomic.RunExchange          (ns):   1007 Average,    973 Minimal,   1041 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    485 Average,    446 Minimal,    524 Maxmial

D:\>InterlockVsMonitor.exe 4
Using 4 threads:
          InterlockAtomic.RunCompareExchange   (ns):    345 Average,    305 Minimal,    370 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):   1901 Average,   1711 Minimal,   2064 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):   1048 Average,    925 Minimal,   1101 Maxmial
          InterlockAtomic.RunExchange          (ns):    395 Average,    322 Minimal,    456 Maxmial
            MonitorAtomic.RunExchange          (ns):   1797 Average,   1488 Minimal,   2030 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    816 Average,    561 Minimal,   1151 Maxmial

D:\>InterlockVsMonitor.exe 16
Using 16 threads:
          InterlockAtomic.RunCompareExchange   (ns):    998 Average,    736 Minimal,   1424 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):   8315 Average,   3623 Minimal,   9941 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):   4480 Average,   3345 Minimal,   5104 Maxmial
          InterlockAtomic.RunExchange          (ns):   2051 Average,   1522 Minimal,   2448 Maxmial
            MonitorAtomic.RunExchange          (ns):   9353 Average,   5795 Minimal,  11104 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):   3419 Average,   1509 Minimal,   4582 Maxmial

In the 4 CPU box, Interlocked start to lose the race when number of parallel thread increases.

D:\>InterlockVsMonitor.exe 1
Using 1 threads:
          InterlockAtomic.RunCompareExchange   (ns):    181 Average,    181 Minimal,    181 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):    853 Average,    853 Minimal,    853 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    461 Average,    461 Minimal,    461 Maxmial
          InterlockAtomic.RunExchange          (ns):    179 Average,    179 Minimal,    179 Maxmial
            MonitorAtomic.RunExchange          (ns):    796 Average,    796 Minimal,    796 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    441 Average,    441 Minimal,    441 Maxmial

D:\>InterlockVsMonitor.exe 2
Using 2 threads:
          InterlockAtomic.RunCompareExchange   (ns):    864 Average,    863 Minimal,    865 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):   1984 Average,   1965 Minimal,   2003 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    948 Average,    870 Minimal,   1026 Maxmial
          InterlockAtomic.RunExchange          (ns):   1155 Average,   1155 Minimal,   1156 Maxmial
            MonitorAtomic.RunExchange          (ns):   1852 Average,   1768 Minimal,   1936 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    925 Average,    852 Minimal,    999 Maxmial

D:\>InterlockVsMonitor.exe 4
Using 4 threads:
          InterlockAtomic.RunCompareExchange   (ns):   2558 Average,   2539 Minimal,   2575 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):   4553 Average,   4241 Minimal,   5198 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):   2502 Average,   2438 Minimal,   2543 Maxmial
          InterlockAtomic.RunExchange          (ns):   4809 Average,   4748 Minimal,   4870 Maxmial
            MonitorAtomic.RunExchange          (ns):   4659 Average,   4504 Minimal,   4780 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):   2455 Average,   2378 Minimal,   2509 Maxmial

D:\>InterlockVsMonitor.exe 16
Using 16 threads:
          InterlockAtomic.RunCompareExchange   (ns):   9238 Average,   7494 Minimal,  10111 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):  17039 Average,  12189 Minimal,  18937 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):   9110 Average,   6562 Minimal,  10364 Maxmial
          InterlockAtomic.RunExchange          (ns):  12504 Average,   5275 Minimal,  18905 Maxmial
            MonitorAtomic.RunExchange          (ns):  17205 Average,  11394 Minimal,  19518 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):   8934 Average,   7105 Minimal,  10300 Maxmial

In the 16 CPU box, Interlocked can only win the game in single thread test. Interlocked is nearly 4 times slow in the 2 and 4 thread test. And somehow get close in the 16 thread test.

D:\>InterlockVsMonitor.exe 1
Using 1 threads:
          InterlockAtomic.RunCompareExchange   (ns):    138 Average,    138 Minimal,    138 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):    463 Average,    463 Minimal,    463 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    311 Average,    311 Minimal,    311 Maxmial
          InterlockAtomic.RunExchange          (ns):    133 Average,    133 Minimal,    133 Maxmial
            MonitorAtomic.RunExchange          (ns):    457 Average,    457 Minimal,    457 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    257 Average,    257 Minimal,    257 Maxmial

D:\>InterlockVsMonitor.exe 2
Using 2 threads:
          InterlockAtomic.RunCompareExchange   (ns):   1855 Average,   1855 Minimal,   1855 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):    876 Average,    873 Minimal,    879 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    482 Average,    448 Minimal,    517 Maxmial
          InterlockAtomic.RunExchange          (ns):   1821 Average,   1821 Minimal,   1822 Maxmial
            MonitorAtomic.RunExchange          (ns):    825 Average,    760 Minimal,    891 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    501 Average,    498 Minimal,    505 Maxmial

D:\>InterlockVsMonitor.exe 4
Using 4 threads:
          InterlockAtomic.RunCompareExchange   (ns):   4158 Average,   4158 Minimal,   4160 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):   1763 Average,   1731 Minimal,   1815 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):    955 Average,    929 Minimal,    998 Maxmial
          InterlockAtomic.RunExchange          (ns):   4192 Average,   4172 Minimal,   4199 Maxmial
            MonitorAtomic.RunExchange          (ns):   1766 Average,   1628 Minimal,   1824 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):    948 Average,    786 Minimal,   1016 Maxmial

D:\>InterlockVsMonitor.exe 16
Using 16 threads:
          InterlockAtomic.RunCompareExchange   (ns):   8399 Average,   8347 Minimal,   8435 Maxmial
            MonitorAtomic.RunCompareExchange   (ns):  11881 Average,  11595 Minimal,  12082 Maxmial
    MonitorVolatileAtomic.RunCompareExchange   (ns):   7296 Average,   6994 Minimal,   7411 Maxmial
          InterlockAtomic.RunExchange          (ns):   8214 Average,   8180 Minimal,   8257 Maxmial
            MonitorAtomic.RunExchange          (ns):  11984 Average,  11556 Minimal,  12197 Maxmial
    MonitorVolatileAtomic.RunExchange          (ns):   7086 Average,   6707 Minimal,   7327 Maxmial

I don't have good explanation to the test result. I don't understand why the 2 CPU box is the fastest for most of tests. And Monitor performs better then Interlocked in multi-thread test on the box has more CPUs.

Update 5/25/2009: One thing is clear that Interlocked is always the fastest in single thread tests. Given another thought, it indicates to me that I should use Interlocked in the cases where access collision is rarely to occur although it does occasionally. For example, preventing the occasional write from slowing down main thread's excessive write access!?

Monday, May 18, 2009

Strong Typed, High Performance Reflection with C# Delegate (Part III)

Update: Open source project SharpCut delivers a less than 50K library which does what described in this series plus much more. Check it out.

Content

  1. Inspiration: C#.Net Calling Grandparent's Virtual Method (base.base in C#)
  2. Prototype: Strong Typed, High Performance Reflection with C# Delegate (Part I)
  3. Performance: Strong Typed, High Performance Reflection with C# Delegate (Part II)
  4. Library Usage: Strong Typed, High Performance Reflection with C# Delegate (Part III) <= you are here

In this post, we are going to discuss how you can easily get a Delegate that let you make high performance reflection call by using various extension methods in a library named CommonReflection.

You can download CommonReflection's binary distribution here. Source code can be checked out from Subversion repository  http://kennethxublogsource.googlecode.com/svn/trunk/CommonReflection hosted on Google Code.

There are ten extension methods defined all in one class of the CommonReflection to let you easily obtain a Delegate to any method by name from a type or an instance of object. Amount those, fix (6) extends System.Type and four (4) extends object type. We'll cover all of them in the following sections.

type.GetStaticInvoker<TDelegate>(string staticMethodName)

This extension method finds a static method with the given name regardless of the scope (i.e. it gets private method too) that

  1. The method has the same number of parameters as TDelegate
  2. Each method parameter must be assignable FROM the corresponding parameter of the Delegate at the same position
  3. The method return type must be assignable TO the return type of the Delegate.
  4. For out and ref parameters, they must match exactly.

In a nutshell, you need to make sure you can make the method call with parameters of type of TDelegate and the return result must be of a sub type of TDelegate's return type.

Giving an example, if you have a method and Delegate defined as

        class MyClass {
            private static Sub Foo(Base b, int i, object o) { return null; }
        }

        private delegate Sub ExactFoo(Base b, int i, object o);
        private delegate Base MatchFoo(Sub b, int i, string s);
        private delegate Sub DoNotMatchFoo(Base b, short i, string s);

Both calls below will return a valid Delegate to invoke the method Foo.

            typeof(MyClass).GetStaticInvoker<ExactFoo>("Foo"); // Good
            typeof(MyClass).GetStaticInvoker<MatchFoo>("Foo"); // Good

But this one will get you a null because "short" is not a sub type of "int".

            typeof(MyClass).GetStaticInvoker<DoNotMatchFoo>("Foo"); // Returns null

type.GetInstanceInvoker<TDelegate>(string instanceMethodName)

Similar to the type.GetStaticInvoker, type.GetInstanceInvoker extension method returns a Delegate that can be used to invoke an instance method on a given type. Because the method is obtained from a Type object, it is not associated to a specific instance. Thus, the instance need to be passed as the first parameter to the Delegate when it is called.

The method matching rules are similar to its static brother except that the first parameter of the TDelegate matches the type object and the second parameter of TDelegate matches to the first parameter of method and so on.

  1. The method has the exactly one less parameters than what TDelegate has.
  2. The first parameter of TDelegate must be assignable TO the given type passed to the extension method.
  3. Each method parameter must be assignable FROM the corresponding parameter of the Delegate at the same position plus one (1). i.e. First parameter of the method matches to the second parameter of Delegate and so on.
  4. The method return type must be assignable TO the return type of the Delegate.
  5. For out and ref parameters, they must match exactly.

Given below class and delegate definition.

        class Parent { private Sub Bar(int i, object o) { return null; } }
        class Child : Parent { }

        private delegate Sub ExactBar(Parent instance, int i, object o);
        private delegate Base MatchBar(Child instance, int i, string s);
        private delegate Base DoNotMatchBar(object instance, int i, object s);

Delegate ExactBar and MatchBar are good match for the Bar instance method and DoNotMatchBar won't match because first parameter is not assignable to Parent type.

            typeof(Parent).GetInstanceInvoker<ExactBar>("Bar"); // Good match
            typeof(Parent).GetInstanceInvoker<MatchBar>("Bar"); // Good match
            typeof(Parent).GetInstanceInvoker<DoNotMatchBar>("Bar"); // Returns null

And here is an example of use

            static MatchBar Bar = typeof(Parent).GetInstanceInvoker<MatchBar>("Bar");

            void AnyMember(Child instance) {
                // High performance, type safe call to private method of Parent
                Base b = Bar(instance, 12, "testing");
            }

type.GetNonVirtualInvoker<TDelegate>(string virtualMethodName)

type.GetNonVirtualInvoker works very similar as the type.GetInstanceInvoker extension method. They have exactly the same method matching rules. The only difference is when the method is a virtual method, the Delegate returned by type.GetInstanceInvoker behaves the same as the virtual method itself (i.e. when it is overridden, the overriding method is used), but the Delegate returned by type.GetNonVirtualInvoker always call the method it bound to. A good example is to call the virtual method of grandparent, which inspired the development of CommonReflection.

instance.GetInstanceInvoker<TDelegate>(string instanceMethodName)

While the Delegate returned from type.GetInstanceInvoker can be used to invoke on any instances of the given type, instance.GetInstranceInvoker is bound to the specific instance. Unalike type.GetInstanceInvoker, the parameters of TDelegate for instance.GetInstanceInvoker should match the instance method. There is no special first parameter requirement any more so code reads more natural this way. The method matching rules is same as type.GetStaticInvoker and repeated below.

  1. The method has the same number of parameters as TDelegate
  2. Each method parameter must be assignable FROM the corresponding parameter of the Delegate at the same position.
  3. The method return type must be assignable TO the return type of the Delegate.
  4. For out and ref parameters, they must match exactly.

Taking the "Bar" example again, notice that the first special parameter is removed from "ExactBar" Delegate.

        class MyClass { private Sub Bar(int i, object o) { return null; } }

        private delegate Sub ExactBar(int i, object o);

And a use case.

        class MyClassUser {
            private readonly ExactBar Bar;

            public MyClassUser(MyClass myClass) {
                Bar = myClass.GetInstanceInvoker<ExactBar>("Bar");
            }
            
            void AnyMember() {
                var sub = Bar(12, "object");
            }
        }

CAUTION: Bare in mind that the benefit of high performance is from the reuse of the Delegate. If we had to generate the Delegate again and again for each call, we may end up with more overhead then simple reflection. Obviously, the reusability of instance.GetInstanceInvoker is reduced comparing to type.GetInstanceInvoker because the former strongly bound to one instance while the later can be used for different instances.

instance.GetNonVirtualInvoker<TDelegate>(Type type, string virtualMethodName)

Like instance.GetInstanceInvoker, instance.GetNonVirtualInvoker provides the similar functionality as type.GetNonVirtualInvoker except that it returns a Delegate that is bound to the given instance. It is obvious that the method matching rules are same as instance.GetInstanceInvoker.

Please note that this extension method takes one more argument then others --the argument "type". Unlike instance.GetInstanceInvoker, which can infer the type from the instance, instance.GetNonVirtualInvoker needs to be told about the type to lookup the method. It only make sense that you want to get a non-virtual invoker to a method defined in the ancestor of the given instance. It also implies that the runtime type of the instance must be assignable to the given type.

CAUTION: Same caution about the reusability for instance.GetInstanceInvoker applies.

Get???InvokerOrFail<TDelegate>(...)

By now, we have discussed half of the ten extension methods that we mentioned in the beginning. The other half are just minor variations of what have discussed by suffixing the method name with "OrFail". Below listed all the five get or fail extension methods.

    type.GetInstanceInvokerOrFail<TDelegate>("StaticMethodName");
    type.GetInstanceInvokerOrFail<TDelegate>("InstanceMthodName");
    type.GetNonVirtualInvokerOrFail<TDelegate>("VirtualMethodName");
    instance.GetInstanceInvokerOrFail<TDelegate>("InstanceMethodName");
    instance.GetNonVirtualInvokerOrFail<TDelegate>(type, "VirtualMethodName");

Those get or fail methods differ from their counterpart by throwing an exception when there is no matching method found. Taking the example in type.GetInstanceInvoker, statement below throws NoMatchException instead of returning a null.

      typeof(Parent).GetInstanceInvokerOrFail<DoNotMatchBar>("Bar"); // exception thrown

Properties and Constructors

As of now, properties and constructors are not supported but can be added in future. Stay tuned at CommonReflection. Hey, it is open source, so you can contribute too!

Friday, May 15, 2009

Strong Typed, High Performance Reflection with C# Delegate (Part II)

Update: Open source project SharpCut delivers a less than 50K library which does what described in this series plus much more. Check it out.

Content

  1. Inspiration: C#.Net Calling Grandparent's Virtual Method (base.base in C#)
  2. Prototype: Strong Typed, High Performance Reflection with C# Delegate (Part I)
  3. Performance: Strong Typed, High Performance Reflection with C# Delegate (Part II) <= you are here
  4. Library Usage: Strong Typed, High Performance Reflection with C# Delegate (Part III)

In the Part I, I completed a prototype of extension method that creates a Delegate to make non-virtual invoke of otherwise virtual method. This prototype had since evolved into a extension method library that gets you a Delegate to any method on a given Type object or any object instance. In this post, We'll compare the performance of direct virtual method call, delegate call and reflection invocation using the Invoke method.

Test Setup

The source used for the performance test is Program.cs which forms a simple console application. The performance is measured by calling a virtual method that doesn't nothing but return a literal integer value. The method take two parameters, one reference type and another is value type, and returns a value type. The virtual method then got overridden in the sub class.

        private class Base
        {
            public virtual int PerfTest(int i, object o) { return 0; }
        }

        private class Sub : Base
        {
            public override int PerfTest(int i, object o) { return 1; }
        }

To illustrate how different types of call are made in the test, let's use pseudo code for clarity. Please see the actual source code for the detail.

Direct Virtual Call

Makes the call to a instance of Sub class with a reference type of Base class.

            Base sub = new Sub();
            DateTime start = DateTime.Now;
            for (int i = loop; i > 0; i--) sub.PerfTest(0, o);
Regular Delegate

Create a Delegate from the method Base.PerfTest on an instance of Sub.

            Base sub = new Sub();
            Func<int, object, int> callDelegate = sub.PerfTest;
            DateTime start = DateTime.Now;
            for (int i = loop; i > 0; i--) callDelegate(1, o);
MethodInfo.Invoke

Obtains a MethodInfo object from the Sub type and calls Invoke method on an instance of Sub.

            Base sub = new Sub();
            MethodInfo methodInfo = typeof(Base).GetMethod(methodName);
            DateTime start = DateTime.Now;
            for (int i = loop/1000; i > 0; i--)
                methodInfo.Invoke(sub, new object[] {1, o});
MethodInfo Delegate

Create a Delegate out of a reflected method from Sub type using the extension method in CommonReflection library. Then call the Delegate.

            var callDelegate = new Sub().GetInstanceInvokerOrFail<Func<int, object, int>>(methodName);
            DateTime start = DateTime.Now;
            for (int i = loop; i > 0; i--) callDelegate(1, o);
DynamicMethod.Invoke

Create a DynamicMethod that performs non-virtual invocation to a virtual method on Base type. Then calls the Invoke method on an instance of Sub type.

            Base sub = new Sub();
            DynamicMethod dynamicMethod = Reflections.CreateDynamicMethod(typeof(Base).GetMethod(methodName));
            DateTime start = DateTime.Now;
            for (int i = loop/1000; i > 0; i--)
                dynamicMethod.Invoke(null, new object[] {sub, 1, o});
DynamicMethod Delegate

Create a Delegate our of a DynamicMethod that performs non-virtual invocation to a virtual method on Base type using the extension method in the CommonReflection library. Then call the Delegate on an instance of Sub type.

            var callDelegate = new Sub().GetNonVirtualInvoker<Func<int, object, int>>(typeof(Base), methodName);
            DateTime start = DateTime.Now;
            for (int i = loop; i > 0; i--) callDelegate(1, o);

Performance Test Result

Same test is repeated twice to avoid any warm up effect. And I tested both Debug build and Release build. The result shows nanoseconds per call, which is calculated by calling the method millions of time in a loop to get the total time, then divide the total time by the number of calls to get the per call time.

The Debug build ran on my laptop with Intel Core 2 Due T7300 2GHz and DDR2 5300 RAM yielded this result:

===== First  Round =====
Direct Virtual Call   :      8.281ns
Regular Delegate      :      8.125ns
MethodInfo.Invoke     :  5,468.750ns
MethodInfo Delegate   :      7.969ns
DynamicMethod.Invoke  :  5,468.750ns
DynamicMethod Delegate:     14.844ns
===== Second Round =====
Direct Virtual Call   :      7.969ns
Regular Delegate      :      7.656ns
MethodInfo.Invoke     :  5,468.750ns
MethodInfo Delegate   :      7.813ns
DynamicMethod.Invoke  :  5,468.750ns
DynamicMethod Delegate:     14.844ns

And the Release build test result is

===== First  Round =====
Direct Virtual Call   :      3.594ns
Regular Delegate      :      2.813ns
MethodInfo.Invoke     :  5,468.750ns
MethodInfo Delegate   :      2.813ns
DynamicMethod.Invoke  :  5,625.000ns
DynamicMethod Delegate:      3.750ns
===== Second Round =====
Direct Virtual Call   :      2.813ns
Regular Delegate      :      2.969ns
MethodInfo.Invoke     :  5,312.500ns
MethodInfo Delegate   :      2.656ns
DynamicMethod.Invoke  :  5,625.000ns
DynamicMethod Delegate:      3.594ns

Conclusion

  • The performance of Delegate call is as fast as regular method call. This is quite different from what I learned before in an MSDN article. (Update 5/18: it is confirmed. See on wikipedia and Jon Skeet's blog post
  • The Delegate created from the reflection is as fast as regular method call.
  • The reflection call of MethodInfo.Invoke and DynamicMethod.Invoke is 1500-2000 times slower.

In next post, I'll explain the use of each method in the extension library named "CommonReflection" that you can download its binary here.

Wednesday, May 13, 2009

Un-brick Linksys WRT54GL After Failed Upgrading to DD-WRT

When I directly load the Standard DD-WRT 24 SP1 into a brand new WRT54GL, it failed. After reboot, I lost the access to the router. Now whatever I do, the router's power LED just keep blinking. So I have bricked my new router! Further research tells that I should have upgrade to a DD-WRT Mini first, the information on the WRT54GL wikipedia page is misleading by saying that version 1.1 can go direct from Linksys firmware to 4MB 3rd party image, which is very wrong.

Again I searched about how to un-brick it. I have tried 30/30/30 hard reset, recvudp and etc. Non of them worked for me. It doesn't seem to have a direct answer to my situation. Information about using Windows XP is also very rare. But my research made me understood the WRT54GL better and it turned out that my situation was actually a bit better them some others.

Un-brick the router

Krunk4ever was the closest post I found but taking the router apart was not necessary for me. So here is the step by step instructions of what I did on a Windows XP to recover my router and hopefully save somebody some time.

  1. Download the latest Linksys firmware to the PC and unzip it.
  2. Directly connect the PC to router's LAN 1 port. Turn off any other wired or wireless connections on PC.
  3. Go to the control panel -> Network Connections.
  4. LocaAreaConnectionRight click on the Local Area Connection and select Properties. (if you have more then one Local Area Connection, pick the one is connected to router which should have limited connection now) 
  5. In the property page, select the Internet Protocol and click on the Properties button
       ConnectionProperty
  6. Select "User the following IP address" and enter 192.168.0.1 in the IP address text box.
    SetIP
  7. Close all dialog boxes by clicking OK.
  8. Open a command prompt, CD to where the firmware was saved in step 1.
  9. Run command below replace the file name with the firmware that you have downloaded.
                 tftp -i 192.168.1.1 put FW_WRT54GL_4.30.12.3_US_EN_code.bin
    You should get a response like below.
                 Transfer successful: 2941952 bytes in 5 seconds, 588390 bytes/s
  10. Do do anything now, wait for the route to restart by itself until you can ping it and then point your browser to http://192.168.1.1

Upon successful un-bricking, don't forget to revert the changes that you have made in the step 6.

Troubleshooting in the step 9

  • If you get "Error on server : code pattern incorrect", then you have the wrong firmware file. Make sure you downloaded the right firmware.
  • If you get "Timeout occurred", power down the router, wait for 10 seconds and power the router on. When the power LED started blinking, try step 9 again. If still doesn't work, then you are having a different problem then mine. Check the post at Krunk4ever.

Installing DD-WRT.

This time, I made sure to install the Mini version first and then Standard version, the installation was very smooth.

Friday, May 08, 2009

Strong Typed, High Performance Reflection with C# Delegate

Update: Open source project SharpCut delivers a less than 50K library which does what described here in one line plus much more. Check it out.

Content

  1. Inspiration: C#.Net Calling Grandparent's Virtual Method (base.base in C#)
  2. Prototype: Strong Typed, High Performance Reflection with C# Delegate (Part I) <= you are here
  3. Performance: Strong Typed, High Performance Reflection with C# Delegate (Part II)
  4. Library Usage: Strong Typed, High Performance Reflection with C# Delegate (Part III) 

The process of finding a solution for base.base.VirtualMethod() in C# inspired me to create utility/extension method for reflection. The goal was to use delegate instead of MethodInfo.Invoke or DynamicMethod.Invoke. Using the Invoke method requires boxing all value type to object, create an object array, unboxing before call the actual method, boxing/unboxing the return value if necessary and cast the return value to the excepted data type. Very involved indeed.

If the MethodInfo or DynamicMethod is reused to make the call, using Invoke method is costly and error prone. Can we create a Delegate object out of them so that it can be strong typed, and hopefully more efficient.

DynamicMethod class has overloaded methods named CreateDelegate. And Delegate class too has CreateDelegate methods that takes MethodInfo, so .Net framework does have the weapon we need.

Let's go step by step.

Utility method to create non-virtual invoke DynamicMethod from MethodInfo

Fist, extract the part that create the DynamicMethod in my last post and enhance it to work with any given MethodInfo object.

        public static DynamicMethod CreateNonVirtualDynamicMethod(this MethodInfo method)
        {
            int offset = (method.IsStatic ? 0 : 1);
            var parameters = method.GetParameters();
            int size = parameters.Length + offset;
            Type[] types = new Type[size];
            if (offset > 0) types[0] = method.DeclaringType;
            for (int i = offset; i < size; i++)
            {
                types[i] = parameters[i - offset].ParameterType;
            }

            DynamicMethod dynamicMethod = new DynamicMethod(
                "NonVirtualInvoker_" + method.Name, method.ReturnType, types, method.DeclaringType);
            ILGenerator il = dynamicMethod.GetILGenerator();
            for (int i = 0; i < types.Length; i++) il.Emit(OpCodes.Ldarg, i);
            il.EmitCall(OpCodes.Call, method, null);
            il.Emit(OpCodes.Ret);
            return dynamicMethod;
        }

With this tool, we can slim down our implementation of base.base in class C quite a bit.


    class C : B
    {
        private static readonly DynamicMethod baseBaseFoo;
        static C()
        {
            MethodInfo fooA = typeof(A).GetMethod("foo", BindingFlags.Public | BindingFlags.Instance);
            baseBaseFoo = fooA.CreateNonVirtualDynamicMethod();
        }
        public override string foo() { return (string)baseBaseFoo.Invoke(null, new object[] { this }); }
    }

Create Delegate from DynamicMethod

As we said in the beginning that the DynamicMethod.Invoke is verbose and and inefficient. The solution is to create a Delegate out of DynamicMethod and use the Delegate.  We can do it right inside the class C, and you can see that the call to the baseBaseFoo now is short, clean and strong typed. We'll discuss the performance benefit in next post.


    class C : B
    {
        private static readonly Func<A, string> baseBaseFoo;
        static C()
        {
            MethodInfo fooA = typeof(A).GetMethod("foo", BindingFlags.Public | BindingFlags.Instance);
            baseBaseFoo = 
                (Func<A, string>)fooA.CreateNonVirtualDynamicMethod()
                .CreateDelegate(typeof(Func<A, string>));
        }
        public override string foo() { return baseBaseFoo(this); }
    }

This is great with one downside is that we had to cast here and there when we create the Delegate. Can we extract this logic into a generic method so that in class C I can simply do this?

    baseBaseFoo = GetNonVirtualInvoker<Func<A, string>>(fooA);

My first attempt was not successful. See the code below


        public static TDelegate GetNonVirtualInvoker<TDelegate>(this MethodInfo method)
            where TDelegate : Delegate
        {
            var dynamicMethod = CreateNonVirtualDynamicMethod(method);
            return (TDelegate)dynamicMethod.CreateDelegate(typeof(TDelegate));
        }

It would be most ideal if this works so that the generic method only takes Delegate as type parameter. But the compiler give me a red line under the constrain type Delegate and complained that "Constraint cannot be special class 'System.Delegate'". Why Microsoft? Come and vote for the change here!

My second attempt is to use cast. But the compiler was still unhappy with error message: "Cannot cast expression of type 'System.Delegate' to 'TDelegate'".

        public static TDelegate GetNonVirtualInvoker<TDelegate>(this MethodInfo method)
        {
            var dynamicMethod = CreateNonVirtualDynamicMethod(method);
            return (TDelegate)dynamicMethod.CreateDelegate(typeof(TDelegate));
        }

This is very annoying. Is something wrong with the framework/language design? The workaround turn out to be very simply. Cast to object then back to TDelegate. (Update 5/15: actually there is a workaround to avoid double cast)

Finally the code below works:
        public static TDelegate GetNonVirtualInvoker<TDelegate>(this MethodInfo method)
        {
            var dynamicMethod = CreateNonVirtualDynamicMethod(method);
            return (TDelegate)(object)dynamicMethod.CreateDelegate(typeof(TDelegate));
        }

Thus the class C can be further simplified to:

    class C : B
    {
        private static readonly Func<A, string> baseBaseFoo;
        static C()
        {
            MethodInfo fooA = typeof(A).GetMethod("foo", BindingFlags.Public | BindingFlags.Instance);
            baseBaseFoo = fooA.GetNonVirtualInvoker<Func<A, string>>();
        }
        public override string foo() { return baseBaseFoo(this); }
    }

One stop shop extension method returns Delegate from type and method name

Now look at the goal set in the last post and repeated below. The class C must be further cut down to achieve the goal.

    class C : B
    {
        private static readonly Func<A, string> baseBaseFoo = 
            typeof(A).GetNonVirtualInvoker<Func<A, string>>("foo");
        public override string foo() { return baseBaseFoo(this); }
    }

Indeed, getting the MethodInfo object from a given type is never a one liner, it becomes verbose when method is overloaded thus parameters types matching is necessary. Things are getting more interesting now. Delegate has the precise information about the signature of method. Our utility method can be further enhanced to find the method from a given type with just a method name, because the parameter information can be found in the Delegate type.

        public static TDelegate GetNonVirtualMethod<TDelegate>(this Type type, string name)
        {
            Type delegateType = typeof(TDelegate);
            if (!typeof(MulticastDelegate).IsAssignableFrom(delegateType))
            {
                throw new InvalidOperationException(
                    "Expecting type parameter to be a Delegate type, but got " +
                    delegateType.FullName);
            }
            var invoke = delegateType.GetMethod("Invoke");
            ParameterInfo[] parameters = invoke.GetParameters();
            int size = parameters.Length - 1;
            Type[] types = new Type[size];
            for (int i = 0; i < size; i++)
            {
                types[i] = parameters[i + 1].ParameterType;
            }
            var method = type.GetMethod(name, 
                BindingFlags.Public | BindingFlags.NonPublic | 
                BindingFlags.Instance | BindingFlags.InvokeMethod, 
                null, types, null);
            if (method == null) return default(TDelegate);
            var dynamicMethod = CreateNonVirtualDynamicMethod(method);
            return (TDelegate)(object)dynamicMethod.CreateDelegate(delegateType);
        }

This extension method let you created a Delegate that can make non-virtual invocation to the named method of given type. The method parameters matches the signature of the Delegate. For example, instance method ClassA.Method1(string, int) matches Delegate(ClassA, string, int). The extension method started with making sure the type parameter is indeed a Delegate type, then retrieve the parameter types from the Delegate's Invoke method, then lookup the method in the given type, create dynamic method and finally create the Delegate.

Continue...

The complete code used in this blog can be found here. The code is the result of inception and prototype of the Common.Reflection. In next few posts, we'll implement formal extension methods with enhanced features and compare the performance between direct method call, reflection invoke and Delegate call.