Want to show your appreciation?
Please a cup of tea.

Sunday, April 24, 2016

Analytical Function in MySQL - ROW_NUMBER, RANK, DENSE_RANK

It was unfortunate that MySQL doesn't support analytical function. The problem got my attention when we were considering MySQL as staging and/or operational database for a Data Warehouse solution. A bit research reveals that it was indeed not very difficult to emulate some of them. Here is the workaround to get the the result of ROW_NUMBER, RANK and DENSE_RANK in MySQL.

Let's take an example to emulate the standard SQL99 analytical query below:

SELECT 
  ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS row_number,
  RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS dense_rank,
  t.* 
FROM table1 t

The MySQL workaround can be written with a mix of variables and sub queries:

SELECT
  @row_num:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, @row_num+1, 1) AS row_number,
  @dense:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, IF(@prev_col3=col3, @dense, @dense+1), 1) AS dense_rank,
  @rank:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2 AND @prev_col3=col3, @rank, @row_num) AS rank,
  t.*
FROM (SELECT * FROM table1 ORDER BY col1, col2, col3 DESC) t,
     (SELECT @row_num:=1, @dense:=1, @rank:=1, @prev_col1:=NULL, @prev_col2:=NULL, @prev_col3:=NULL) var

This solution requires no self join, no temp table, still single pass, and generic enough to adapt it to any use case. But I admit that it is quite convoluted. Very sad that MySQL doesn't include such basic functions that are well achievable.

Monday, December 21, 2015

Downsize HDD on Your Windows Box

Upgrade a laptop from HDD to SSD is probably the best investment for performance. I have done it for a lot of them for my friends and relatives. For most of time, it is always a good idea to take the opportunity to clean up the system by reinstalling the OS. But today, my friend handed me a 2nd hand laptop with Windows 7 and Office 2010 on it but there is neither COA in the back of the computer, nor does she have the product keys.

So I was challenged to downsize her 320GB HDD to 128G SSD. I started the venture and finally done now although there were quite some bumpers along the way. I will not go through every each mistakes that I have made but thought should document the correct process for someone else or myself will need to do the same again.

Backup Your Computer

Use an external HDD or NAS to back up your computer, Windows 7 have build-in support to do this. Go to Control Panel and search for “Back and Restore”.

Preparation

  1. You can either use DVD Writer drive or USB drive if your computer can boot from USB HDD. I used two microSD card with a tiny USB card reader. I have a batch of 4GB/8GB microSD cards but essentially one used 4GB and another used 2GB. You may be able to fix all into one 4GB or 8GB card.
  2. You’ll need to somehow attach both disk to the computer. Either your computer can fit both or use an external HDD enclosure. This was easy for me because the SSD was mSATA so Thinkpad X220 can fit in both HDD and SSD.
  3. Windows 7 installation DVD, or bootable USB Drive. I have Windows 7 installer in one of microSD cards. To create one, you’ll need a 4GB+ USB memory stick, or any memory card with an USB card reader, and use the tools below
    • ISO to USB was the tool I used long time back to create a bootable USB memory stick from an ISO image
    • If you don’t have Windows 7 installation DVD ISO image, you maybe able to download one somewhere or create on from the installation DVD, I have used both Win32 Disk Imager (just for that) and DAEMON Tools Lite (plus a virtual drive) with great success.
    • You can also mix this with next two items for this purpose, but I already and also prefer to have a separate one for Windows Installer.
  4. GParted for resizing partition. The site has instruction to created a dedicated CD or USB. It also come with most of Linux DVD. I used my existing Ubuntu ISO.
  5. Clonezilla for disk cloning. Similarly, the site has instruction to created a dedicated CD or USB. But I used Easy2Boot to combine it with Ubuntu image into one bootable USB. If you don’t have and need Ubuntu, you can just combine GParted and Clonezilla with Easy2Boot. You can possibly further combine Windows 7 Installer into the same Easy2Boot disk if you’d like.

In Action

  1. Again, backup your computer. You can backup system after the clean up in next step but you risk the mistake you may take during the clean up.
  2. Clean up your existing HDD to make sure the C drive is much less than the size of new SSD so you have enough free space after downsizing. This was not a problem for me as existing HDD has only 45GB data in it. Having larger free size can also help with defragmentation and resize later. Sometime, you just need to move videos and images to external storage first.
  3. Defragment the C drive. This will take a long while.
  4. Attached new SSD drive to the computer, initialize the disk with MGR. I don’t have much experience with GPT and my the existing disk was MBR too.
  5. Boot into GParted, resize the C partition (not System Reserved partition that is only 100M) on HDD to be able to fit into SSD. You can check the SSD size in GParted, I resized C on HDD to be 1GB less than SSD capacity.
  6. Reboot computer normally into Windows 7, it will most likely run check disk on C, or even run system recovery. Let it complete what it need to do. You should have a much smaller C drive now.
  7. Reboot into Clonezilla, go to command line and run ‘sudo fdisk’. First record the exact parameter of the 100M system reserved partition. Exit and fdisk the SSD drive. Create first partition with exactly same parameter as we recorded for system reserved partition. Create another partition for the remaining space. Make system reserved partition active and make sure to change ID to 7 for both partitions. Exit command line and back to Clonezilla
  8. Use local partition to partition clone option to copy the 100MB system reserved partition first, then use the same process to copy C partition from HDD to SDD.
  9. Remove the old HDD now and try to boot from SSD. If everything get up and running and you have C drive, Congrats! But
  10. But I got an error screen in text immediately attempting to boot from SSD. Need to boot from Windows installer DVD or USB and run repair. This should allow you to boot into Windows.
  11. In my case, it took a long time to log in to an empty screen. Press Ctl-Alt-Delete and select Start Task Manager. Use File->New Task (Run…) menu item and click on Browse button. I noticed that the C driver was missing and my primary partition as E drive, yours may be a different drive. Please remember the driver letter as we’ll need it later. You can start explore.exe or cmd.exe from there but what you can do with the computer is greatly limited. the real problem is that you cannot start regedit.exe.
  12. Fix the driver letter problem! Boot with your Windows Installer DVD or USB again into recovery console, follow the instructions here, but you need to open SYSTEM hive instead, give name as NEWSYSTEM
  13. You need to edit the registry key to change the drive letter of your primary partition. Replace HKEY_LOCAL_MACHINE\SYSTEM\ in the instruction with HKEY_LOCAL_MACHINE\NEWSYSTEM\
  14. Restart the computer and now all is good. I add the old HDD back and re-partitioned to be a large spare drive.

Saturday, May 30, 2015

Subversion (SVN) To Git Migration

Summary

There are many blogs and pages tell you how to use git-svn to do the work. But I found many of them works for simple use case but becomes cumbersome to use for complex, none-standard Subversion repository layout. This document try to provide a comprehensive guidance to make your migration from Subversion to Git a stress free process.

Repository Layout

Your repository layout has a deep impact to the effort of migration. Most of Subversion repository fall into one of below three layout categories.

Standard Layout

Standard layout repository strictly follows the trunk, branches and tags naming convention with exact case-sensitive top level directory names. There should be no other top level directory that you care about except those three.
WARNING: If you need to import any other top level directory as a branch, then it is no longer a standard layout so you need to follow instructions in None-Standard Layout section.
WARNING: Some of repositories use title case convention. i.e. with top level directory of Trunk, Branches and Tags. Those repositories are NOT standard layout due to the upper case letters. They are categorized as Semi-Standard Layout.
Note: For the purpose of this discussion the naming convention of individual branch or tag is irrelevant. Although in strict speaking, one would expect individual branch is named as major.minor (e.g. 1.2, 1.3) and tag is named as major.minor.revision (e.g. 1.2.0, 1.3.4).
To migrate Subversion repository with Standard Layout, follow the instructions in Using Stash Import Utility if you use Atlassian Stash as your GIT repository manager, or you can always use svn2git for any git repository.

Standard Layout repository example.

project-one
    +-- branches
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- tags
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- trunk
Note: Repositories don't have all three top level directories still conform to standard layout as long as other top level directories conform to the naming convention. For example below repository doesn't have tag but it is considered as Standard Layout.
some-service
    +-- branches
    +-- trunk

Semi-Standard Layout

Semi-Standard Layout has similar directory structure as Standard Layout but doesn't follow the strict top level directory naming convention. It has similar trunk, branches and tags concept the top level directories are named differently, including some use title case, some use singular form, or some are completely different. There should be no other top level directory that you care about except those three.
WARNING: If you need to import any other top level directory as a branch, then it is no longer a standard layout so you need to follow instructions in None-Standard Layout section.
To migrate Subversion repository with Semi-Standard Layout, follow the instructions in Using Stash Import Utility if you use Atlassian Stash as your GIT repository manager, or you can always use svn2git for any git repository.

Below are some examples of Semi-Standard Layout.

Example 1: use title case Trunk, Branches and Tags.
project-two
    +-- Branches
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- Tags
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- Trunk

Example 2: use singular trunk, branch and tag.
repo1
    +-- branch
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- tag
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- trunk

Example 3: use main, features and releases instead of trunk, branches and tags.
STOP! Some repositories store binary artifacts under Releases directory, that is fundamentally different and we should never import any release binary into Git.
repo2
    +-- features
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- releases
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- main
Note: Repositories don't have all three top level directories still conform to Semi-Standard Layout as long as other part of directory structure conforms to it. For example below repository doesn't have tag but it is considered as Semi-Standard Layout.
some-service
    +-- features
    +-- main

None-Standard Layout

None-Standard Layout refer to repositories that is not in Standard or Semi-Standard layout. Their branches are all reside under the root of project directory. For example:

project-three
    +-- Current
    +-- Dev
    +-- QA
    +-- UAT
    +-- Release (binary)
You'll need to use svn2git to migrate such repositories to Git.

Using Stash Import Utility

Stash has a build in utility to import Subversion repository as part of the repository creation wizard or thought the setting page of the repository. This can be used to import Subversion repositories with Standard and Semi-Standard Layout.
STOP! if you haven't read about the repository layout.
STOP! if you need resync from Subversion to Git later. Please consider using svn2git.
1. Create a new repository
  • If you already have the repository created and still empty, move on to next step.
  • If you need to reuse a repository that is not empty, sorry you cannot use this utility. Try to use svn2git.
Follow the instructions below to create a new Git repository in Stash.
  1. Go to your Stash web interface and login.
  2. Select a Stash project that is appropriate for the repository that you are creating.
  3. Click "Create Repository" button next to the Stash project name located in the upper-left corner of the project page. If you don't see the button, and you already signed in to Stash, then you need to request for permission.
  4. Enter the name of the repository, click on the "Create repository" button.
You should get an confirmation page telling you that an empty repository is now created. In the bottom of the page, you can find a button labeled as "Import from SVN", check on that to start importing. Move on to step No.3.  

2. Reusing an empty repository
If you have an empty repository that you want to use as target of migration. Go to the project and click on settings tab, if you don't see settings tab, you need to ask for permission to get there. In the settings page, click on the 'Import from SVN' link at the end of left menu to start importing.

3. Import repository
  • Enter the URL to the subversion repository that you are importing.
  • For Standard Layout project, you can leave Trunk, Branches and Tags field as default. Otherwise, enter the proper value for your repository. You can leave it the default if you don't have branches and/or tags.
  • Enter your username and password to access SVN repository.
Below table listed three examples:
Layout Trunk Branches Tags Comments
Standard trunk branches/* tags/* Leaves default value in place
Semi-Standard Trunk Branches/* Tags/* Enter the values for Trunk, Branches and Tags.
Semi-Standard Current branches/* tags/* For single branch import, enter the values for Trunk but leaving Branches and Tags as default.
Click "Import" button. Stash first start to gather the author list, this may take a few seconds to a few minutes and it will return back with a message:
Default authors mapping created. Review it, adjust if necessary and proceed with the import.

4. Review author mapping
Before Stash actually import, it gather author information and ask you to review author mapping. Click on "Continue" button and you'll get a popup windows for you to review the author mapping. Make necessary changes and click on "Continue" button on the popup.
Stash will start to import the repository and update the progress on the page. You can safely close the browser now. You will receive an email telling you that the import is in progress and receive another email when it is done.

Congratulations! Now you have completed the migration from Subversion to Git.

Using svn2git

svn2git is a great tool to migration from Subversion to Git. It uses git-svn under the hood but does a much better, cleaner job. svn2git is a better choice unless you are a guru of both Subversion and Git and a master of svn-git.

Install svn2git

Below lists the installation for various platform but your mileage may vary. If you get it running on a platform not listed below successfully, please update this wiki.
Mac/Linux
 sudo gem install svn2git
Windows (Command Prompto)
 gem install svn2git

Run svn2git

You always run svn2git in an empty directory. After executing svn2git, the directory becomes the root of the converted Git repository that mirrors the Subversion repository.
Note: If you executed svn2git or git-svn with wrong parameter and you want to start over again, you must delete entire directory first then recreate the directory, otherwise you'll get weird error.
WARNING: svn2git can take very long time for large repository, make sure to run on a machine that is close (low network latency) to the Subversion repository server.


Standard Layout Repository

To convert a Standard Layout repository, all you need to do is to run it with URL to Subversion repository. e.g.
mkdir project-one
cd project-one
svn2git https://svn.somecompany.com/svn/project-one/

Semi-Standard Layout Repository

Semi-Standard Layout repository has the same directory structure as Standard Layout repository. The difference is the directory name of trunk, branches and tags. You use corresponding option parameter to specify them. If you have no branch or tag, you can omit it.

Example 1
mkdir project-two
cd project-two
svn2git https://svn.somecompany.com/svn/project-two/ --trunk Trunk --branches Branches --tags Tags

Example 2
mkdir project-simple
cd project-simple
svn2git https://svn.somecompany.com/svn/project-simple/ --trunk Current

None-Standard Layout Repository

For None-Standard Layout repositories, the difficult part is the branches reside at the top level directory. Let's ignore them as the first step and run svn2git with parameters that best matches the Standard or Semi-Standard Layout.

Using project-three as an example,
project-three
    +-- Current
    +-- DEV
    +-- QA
    +-- UAT
    +-- Release (binary)

We will take DEV as trunk, take Current, UAT and QA as branches. Let's start with ignoring the top level branches and convert the DEV first.
 
mkdir project-three
cd project-three
git svn init --trunk DEV --prefix svn/ https://svn.somecompany.com/svn/project-three/

Wait for it to complete and then edit the .git/config file to add three lines to the svn-remote section for Current, UAT and QA.
WARNING: Make sure you have all branches well defined and there is no typo of the names, they are case-sensitive. If you missed one and want to include it later, you'll need to clean up the directory and start from scratch.
[svn-remote "svn"]
 url = https://svn.somecompany.com/svn/project-three
 fetch = DEV:refs/remotes/svn/trunk
 fetch = Current:refs/remotes/svn/Current
 fetch = UAT:refs/remotes/svn/UAT
 fetch = QA:refs/remotes/svn/QA
Note: You can also change the branch name in this process. For example, if you'd like to rename the Current branch in Subversion to Test in Git you can use
 fetch = Current:refs/remotes/svn/Test
Then run fetch and rebase. The fetch operation may take long time to complete if you have a big repository.
git svn fetch
svn2git --rebase

Push to Git Repository Server

If you don't already have a remote Git repository to push to, create one first. Then push all branches to the remote repository.
Find the URL to the remote Git repository, which is the same URL you used to clone remote repository. Run below commands in the directory of your local Git repository you just imported from Subversion, replace with the URL to the remote Git repository. 

git remote add origin <URL>
git push origin --all

Congratulations! Now you have completed the migration from Subversion to Git. You are safe to delete your local copy of the repository but if you need to resync from Subversion to Git again, you'll need your local repository.

Resync From Subversion to Git

You can start the migration when still allowing later changes in Subversion to be brought into Git. This is useful when you are moving an actively developed repository. Often you need to prepare the build scripts to use the new Git repository while allow development and build activity continue to occur in the old Subversion repository.
WARNING: Two way sync is neither tested nor recommended. We do not encourage to try have commits go into both old Subversion and new Git repository, then attempt to keep them in sync later. It will be difficult and risky if at all possible.
Go into the local Git repository that you have created and used for svn2git migration and run below commands

svn2git --rebase
git push origin --all

Q & A

Q: Can svn2git support author mapping?
A: Yes, see https://github.com/nirvdrum/svn2git#authors

Saturday, January 11, 2014

My Digital Toolbox

This is to keep a list of utilities that I use.

Security

General Utilities

Development Utility:

  • JMeter
  • SoapUI

Connectivity:

  • putty (Windows)
  • MobaXterm

Digital Content Utilities

Android Apps:

  • Juice Defender
  • GPS Status
  • Waze
  • Lux Free Dash
  • Prefixer
  • ConnectBot
  • EbookDroid
  • BSPlayer – Can browse samba share and play a lot video formats, including RMVB, KMV.
  • NYC Bus & Subway Maps
  • Google Pinyin (Not the best Chinese keyboard but balance of performance and feature)
  • Terminal Emulator
  • Ping & DNS

Tuesday, October 08, 2013

If/Else Statement vs. Ternary Operator

The question of which one is faster, the ternary operator or if/else statement, was asked again and again. The canned answer is: you don’t care, don’t even think about performance here, just choose the one that is most readable in the context you use.

But leave aside about the argument of which one is more readable. If I’m going to call it hundred millions of times, the performance difference might be matter to me, right? I was again asked about this question recently so I decided to find this out. Only test can tell the truth.

Test Setup

We define each run as repeatedly transmitting data from an array with 10 elements to another array multi-million times. When the data are transmitted from one array to another, we also transform the date. We used different methods to transform the data so we can compare the time difference between different methods. We also capture the baseline performance. A baseline does very same thing as the regular run but with not data transformation.

The code for every run is somewhat like below.

   1:          @Override
   2:          void run() {
   3:              int a[] = source;
   4:              int b[] = target;
   5:              for (int i = LOOP_COUNT; i > 0; i--) {
   6:                  for (int j = ELEMENT_END; j >= 0; j--) {
   7:                      // copy and transform the data from a to b
   8:                  }
   9:              }
  10:          }

We execute the run 20 times to warm up the system and run it another 100 times with time measured for each run.

   1:          // warn up
   2:          for (int j = 20; j > 0; j--) {
   3:              for (Fixture f : fixtures)
   4:                  f.run();
   5:          }
   6:   
   7:          // run and measure it
   8:          for (int j = 100; j > 0; j--) {
   9:              for (Fixture f : fixtures)
  10:                  f.runMeasured();
  11:          }

And the code to measure the performance.

   1:      void runMeasured() {
   2:          long start = System.nanoTime();
   3:          run();
   4:          histogram.update(System.nanoTime() - start);
   5:      }

You can find the source code on github.

And all the test run was on a Lenovo X220 Laptop with Core i5-2520M @ 2.5GHz, Running Windows 7 64bit and JRE 1.7 64bit.

Absolute Value Test

The first test is to transform the data in the source array into their absolute value and then store them in target array. The data in the source array were purposely set to be half positive and half negative.

The similar tests were done in two different ways. Both use a base class and have each derived class overrides a method. The difference is that, one have the loop logic common in base class and let the derived class only override the method that does the absolute value calculation, another repeats the loop logic in each derived class and have the absolute value calculation inlined inside the loop.

Using Overridden Methods

The test is captured in RunnerAbsOverride class. Below is the run logic

   1:          void run() {
   2:              int a[] = source;
   3:              int b[] = target;
   4:              for (int i = LOOP_COUNT; i > 0; i--) {
   5:                  for (int j = ELEMENT_END; j >= 0; j--) {
   6:                      int x = a[j];
   7:                      b[j] = abs(x);
   8:                  }
   9:              }
  10:          }

We compare the performance different by using different implementations of the abs methods.

Baseline

The baseline implementation doesn’t really compute the absolute value. It simply returns itself without no data transformation.

            return x;
Using Math.abs

This method calls the JDK build in abs method in Math class to transform the date into absolute value.

            return Math.abs(x);
Using ternary operator

Here we use ternary operator to compute the absolute value of x.

            return x >= 0 ? x : -x;
Using if/else statement

At last, we use if/else statement to return x if it is positive and –x if x is negative.

            if (x >= 0) return x;
            else return -x;
Test Result

Below is the result of calculating absolute value in an overridden method of different implementations. Time is the nano seconds taken to transform and copy the array.

Method,    Mean,   Min,   Max, Median,  75%
Baseline,  36.4,  35.6,  39.0,  36.2,  36.6
MathAbs ,  40.6,  39.6,  49.9,  40.2,  40.7
Ternary ,  40.3,  39.6,  42.2,  40.2,  40.6
IfElse  ,  40.4,  39.5,  43.9,  40.2,  40.7

We observed three things here

  1. Comparing to the baseline, computing the absolute value of an integer takes additional 11% more time on top of baseline’s looping and a method call.
  2. All three methods of calculating an absolute value has same performance.
  3. We didn’t see any difference of using the built in Math.abs. Which means the static method is inlined and nothing else special was done to this JDK method.

Inlining the Code

What if we inline the code directly in the loop? Let’s repeat the loop below in every implementation and replace the comment with the different methods of computing the absolute values. This test is captured in RunnerAbsInline class.

   1:          final void run() {
   2:              final int a[] = source;
   3:              final int b[] = target;
   4:              for (int i = LOOP_COUNT; i > 0; i--) {
   5:                  for (int j = ELEMENT_END; j >= 0; j--) {
   6:                     // compute absolute value of a[j] then assign it to b[j]
   7:                  }
   8:              }
   9:          }
Baseline
                    b[j] = a[j];
Using Math.abs
                    b[j] = Math.abs(a[j]);
Using ternary operator
                    b[j] = a[j] >= 0 ? a[j] : -a[j];
Using if/else statement
                    if (a[j] >= 0) b[j] = a[j];
                    else b[j] = -a[j];
Using if statement without else

When it is inlined, it allow us to use a single if statement to get the absolute value.

                    int x = a[j];
                    if (x < 0) x = -x;
                    b[j] = x;
Test result

Below is the result of calculating absolute value inlined in the loop of different implementations. Time is the nano seconds taken to transform and copy the array.

Method,    Mean,   Min,   Max, Median,  75%
Baseline,   1.7,   1.6,   3.0,   1.7,   1.7
MathAbs ,   7.0,   6.7,   8.6,   6.9,   7.0
Ternary ,   7.0,   6.7,   8.4,   6.9,   7.0
IfElse  ,   8.6,   8.4,   9.6,   8.5,   8.6
IfOnly  ,   6.9,   6.7,   9.0,   6.9,   7.0

The numbers above tells us that

  1. Using if/else statement is 30% slower than other methods in average.
  2. We have added a if statement only implementation is at the same speed as Math.abs() and ternary operator.
  3. Again, we didn’t see the difference between using Math.abs vs. the ternary operation here.
  4. The overhead of method invocation is much higher than any of the method we used.

Number Ordering Test

The 2nd test is compare the numbers in the two adjacent elements of the source array, then store the two number to the target array in ascending order. The data in the source array were carefully set so that the opportunity is the same for the left element of the adjacent two being smaller or larger.

Again, like we did in the Absolute Value test, we’ll test in two different ways. One have the logic in its own method and another to have the logic inlined in the loop.

Using Overridden Methods

The test is captured in RunnerOrderingOverride class. Below is the run logic

   1:          void run() {
   2:              int a[] = source;
   3:              for (int i = LOOP_COUNT; i > 0; i--) {
   4:                  for (int j = ELEMENT_END; j >= 0; j--) {
   5:                      compareSet(a[j], a[j + 1], j);
   6:                  }
   7:              }
   8:          }

We compare the performance different by using different implementations of the compareSet methods.

Baseline

The baseline implementation doesn’t really compare the value. It simply sets the value in the original order.

            int b[] = target;
            b[index++] = x;
            b[index] = y;
Using Math.abs

This try to avoid the if statement by computing the difference and sum of the two number, and then derive the smaller one and larger from the sum and different. The difference is obtained by subtracting one from another and then taking the absolute value.

            int b[] = target;
            int diff = Math.abs(x - y);
            int sum = x + y;
            b[index++] = sum - diff;
            b[index] = sum + diff;
Using ternary operator

This is very similar to Math.abs, except that the ternary operator is used instead of Math.abs.

            int b[] = target;
            int diff = x >= y ? x - y : y - x;
            int sum = x + y;
            b[index++] = sum - diff;
            b[index] = sum + diff;
Using if/else statement

This uses straightforward if/else statement.

            int b[] = target;
            if (x >= y) {
                b[index++] = y;
                b[index] = x;
            } else {
                b[index++] = x;
                b[index] = y;
            }
Test Result

Below is the result of ordering the numbers in an overridden method of different implementations. The unit is nano second.

Method,    Mean,   Min,   Max, Median,  75%
Baseline,  54.3,  52.6,  66.4,  53.9,  54.7
MathAbs ,  66.3,  64.2,  79.2,  65.6,  66.3
Ternary ,  62.2,  60.2,  79.1,  61.2,  62.0
IfElse  ,  58.9,  56.3,  78.6,  58.0,  58.6

It turned out that the straightforward if/else statement is the fastest. The attempt of replacing it with ternary operator plus multiple steps of integer computation doesn’t really outperform if/else statement. It is actually much slower.

This time, Ternary is faster than Match.abs because it eliminated the negate operation in the Math.abs when x < y.

Inlining the Code

Again, let’s re-test the logic by inlining them in the loop. We repeat the loop below in every implementation and replace the comment with the different methods of ordering the number. This test is captured in RunnerOrderingInline class.

   1:          final void run() {
   2:              int a[] = source;
   3:              int b[] = target;
   4:              for (int i = LOOP_COUNT; i > 0; i--) {
   5:                  for (int j = ELEMENT_END; j >= 0; j--) {
   6:                      final int x = a[j];
   7:                      final int y = a[j + 1];
   8:                      // set the smaller one of x and y to b[j] and the larger one to b[j+1]
   9:                  }
  10:              }
  11:          }
  12:      }
Baseline

Baseline simply assigns the value without ordering them.

                    b[j] = x;
                    b[j + 1] = y;
Using Math.abs
                    int diff = Math.abs(x - y);
                    final int sum = x + y;
                    b[j] = sum - diff;
                    b[j + 1] = sum + diff;
Using ternary operator
                    final int diff = x >= y ? x - y : y - x;
                    final int sum = x + y;
                    b[j] = sum - diff;
                    b[j + 1] = sum + diff;
Using if/else statement
                    if (x >= y) {
                        b[j] = x;
                        b[j + 1] = y;
                    } else {
                        b[j] = y;
                        b[j + 1] = x;
                    }
Test result

Below is the result of ordering the numbers inlined in the loop of different implementations. Unit is nano seconds.

Method,    Mean,   Min,   Max, Median,  75%
Baseline,   3.8,   3.5,   4.9,   3.6,   3.7
MathAbs ,  18.3,  17.3,  22.6,  17.9,  18.5
Ternary ,  13.9,  13.0,  19.0,  13.5,  14.0
IfElse  ,  11.1,  10.6,  13.6,  11.0,  11.2

The result is same as the override method test except it is overall faster because of the inlined code. So method invocation is much more expensive than any of those.

Conclusion

  1. There are no fundamental difference between ternary and if/else.
  2. Ternary is faster then if/else as long as no additional computation is required to convert the logic to us ternary. When it is a simply ternary operation, it has better readability as well.
  3. If only statement is faster than if/else, so if the logic doesn’t require an else statement, do use it.
  4. Don’t try to outsmart. Adding additional operations to replace if/else with ternary lead you to the conversed result. In this case, use straightforward if/else gets you better performance and readability.
  5. JDK doesn’t do native optimization to Math.abs other than inlined the static method call almost everywhere. It is not like others suggested that JDK does native optimization for built in functions.
  6. Comparing the different between if/else and ternary, method invocation is every expensive.

Thursday, August 15, 2013

Words for Versioning

It's always been difficult to come up with good names for versioning or naming of the Agile sprints. Here are idea we used.

Last Name of top scientists

Mainly from this page 100 Scientists Who Shaped World History. But I cannot find a last name starts with Q, U or X. It is also confirmed from the page Alphabetized List by Scientist's Name. Hence, I used word Quantum, Universe and X-ray, which you can find on the page and related to science.
Archimedes Kepler Universe
Bohr Lavoisier Virchow
Copernicus Maxwell Watson
Darwin Newton X-ray
Einstein Onnes Yalow
Freud Pasteur Zhang
Galilei Quantum
Heisenberg Rutherford
Ibn-e-Sina Schrodinger
Jenner Thomson

Name of cars

From http://www.malaysiaminilover.com/list-of-car-names, we picked those names are rarely used in software documentation so the can be searched easily.
Axiom Kodiak Uplander
Boxter Lanos Vitara
Cougar Murano Windstar
Durango Nubira Xantia
Envoy Optima Yaris
Fiero Paseo Zephyr
Galant Quattro
Hombre Reatta
Impala Sorento
Jetta Tredia

Superheros

Here you can find all superheros names

Mountains of the World

http://www.peakware.com/peaks.html

Monday, June 17, 2013

VBA Routine To Set Excel Cell Color By Value

I though some one must have already done this, but I cannot find the ready to use code. So I had to write one myself:

Private Sub Worksheet_Change(ByVal Target As Range)
    Dim Column
    Dim Row
    Dim Red
    Dim Green
    Dim heat
    
    For Each c In Target
        Column = c.Column
        Row = c.Row
        
        Rem # Define the range of the cells to change color, keep this to minimal for better performance
        If Column > 1 And Column <= 12 And Row > 1 And Row < 40 Then
            If c.Value = "" Then
                c.Interior.ColorIndex = -4142
            Else
                Rem # Calculate the heat. You can times c.Value by a factor to control the range
                heat = c.Value - 255
                If (heat > 255) Then heat = 255
                If (heat < -255) Then heat = -255
                
                Rem # Set back color of the cell
                If (heat > 0) Then
                    Green = 256 - heat
                    Red = 255
                Else
                    Green = 255
                    Red = 255 + heat
                End If
                c.Interior.Color = RGB(Red, Green, 0)
                
                Rem # Set fore color of the cell
                If (heat > 100) Then
                    c.Font.Color = vbWhite
                Else
                    c.Font.Color = vbBlack
                End If
            End If
        End If
    Next
End Sub