Want to show your appreciation?
Please a cup of tea.

Sunday, April 24, 2016

Analytical Function in MySQL - ROW_NUMBER, RANK, DENSE_RANK

It was unfortunate that MySQL doesn't support analytical function. The problem got my attention when we were considering MySQL as staging and/or operational database for a Data Warehouse solution. A bit research reveals that it was indeed not very difficult to emulate some of them. Here is the workaround to get the the result of ROW_NUMBER, RANK and DENSE_RANK in MySQL.

Let's take an example to emulate the standard SQL99 analytical query below:

  ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS row_number,
  RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS dense_rank,
FROM table1 t

The MySQL workaround can be written with a mix of variables and sub queries:

  @row_num:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, @row_num+1, 1) AS row_number,
  @dense:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, IF(@prev_col3=col3, @dense, @dense+1), 1) AS dense_rank,
  @rank:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2 AND @prev_col3=col3, @rank, @row_num) AS rank,
FROM (SELECT * FROM table1 ORDER BY col1, col2, col3 DESC) t,
     (SELECT @row_num:=1, @dense:=1, @rank:=1, @prev_col1:=NULL, @prev_col2:=NULL, @prev_col3:=NULL) var

This solution requires no self join, no temp table, still single pass, and generic enough to adapt it to any use case. But I admit that it is quite convoluted. Very sad that MySQL doesn't include such basic functions that are well achievable.


Mahendra V.M said...


Rick James said...

MariaDB just announced "Window Functions" in MariaDB 10.2 -- https://mariadb.com/kb/en/mariadb/window-functions/ . These may eventually obviate your nice, concise, SQL.

-- Rick James

Dan said...

Thank you very much! The ranking solutions I found on Stackoverflow didnt quite work the way I wanted to, but your workaround was the trick. Thanks for sharing with the community!

ToolmakerSteve said...

@prev_col1, 2, 3 are never set to a new value, so all row_nums are "1". After "t.*", need ", @prev_col1:=t.col1, @prev_col2:=t.col2, @prev_col3:=t.col3".

Unknown said...
This comment has been removed by the author.
jberryman said...

I think this is relying on undefined behavior, e.g. this variation gives me incorrect results (row_number is not ordered with respect to the `id` column):

SELECT @row_num:=@row_num+1 AS row_number, t.id FROM (SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;

Anonymous said...

@jberryman re "(SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;"

Your Order BY is in the wrong place. It should be after the implicit join:
(SELECT * FROM table1 WHERE col = 264) t, (SELECT @row_num:=0) var

and you might find the column order matters (remember you're operating in a grey area of MySQL functionality) so it's good not to stray to far from the original example
"SELECT t.id, @row_num:=@row_num+1 AS row_number"

Unknown said...

please give some example table

Anonymous said...

It is indeed a solution. It doesnot work when ranking is in the span of an interval.

My data is
user id timestamp.

if a user comes 10 times to my website i will have user id and 10 timestamps in 10 rows.
I need to group all the user events in one label where the time of events was within an hour period..
the query in mssql is

Could we have a corresponding query in mysql.

Please help i am stuck for hours here.