Want to show your appreciation?
Please a cup of tea.

Sunday, April 24, 2016

Analytical Function in MySQL - ROW_NUMBER, RANK, DENSE_RANK

It was unfortunate that MySQL doesn't support analytical function. The problem got my attention when we were considering MySQL as staging and/or operational database for a Data Warehouse solution. A bit research reveals that it was indeed not very difficult to emulate some of them. Here is the workaround to get the the result of ROW_NUMBER, RANK and DENSE_RANK in MySQL.

Let's take an example to emulate the standard SQL99 analytical query below:

SELECT 
  ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS row_number,
  RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS dense_rank,
  t.* 
FROM table1 t

The MySQL workaround can be written with a mix of variables and sub queries:

SELECT
  @row_num:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, @row_num+1, 1) AS row_number,
  @dense:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2, IF(@prev_col3=col3, @dense, @dense+1), 1) AS dense_rank,
  @rank:=IF(@prev_col1=t.col1 AND @prev_col2=t.col2 AND @prev_col3=col3, @rank, @row_num) AS rank,
  t.*
FROM (SELECT * FROM table1 ORDER BY col1, col2, col3 DESC) t,
     (SELECT @row_num:=1, @dense:=1, @rank:=1, @prev_col1:=NULL, @prev_col2:=NULL, @prev_col3:=NULL) var

This solution requires no self join, no temp table, still single pass, and generic enough to adapt it to any use case. But I admit that it is quite convoluted. Very sad that MySQL doesn't include such basic functions that are well achievable.

9 comments:

Mahendra V.M said...

Super

Rick James said...

MariaDB just announced "Window Functions" in MariaDB 10.2 -- https://mariadb.com/kb/en/mariadb/window-functions/ . These may eventually obviate your nice, concise, SQL.

-- Rick James

Dan said...

Thank you very much! The ranking solutions I found on Stackoverflow didnt quite work the way I wanted to, but your workaround was the trick. Thanks for sharing with the community!

ToolmakerSteve said...

@prev_col1, 2, 3 are never set to a new value, so all row_nums are "1". After "t.*", need ", @prev_col1:=t.col1, @prev_col2:=t.col2, @prev_col3:=t.col3".

Unknown said...
This comment has been removed by the author.
jberryman said...

I think this is relying on undefined behavior, e.g. this variation gives me incorrect results (row_number is not ordered with respect to the `id` column):

SELECT @row_num:=@row_num+1 AS row_number, t.id FROM (SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;

Anonymous said...

@jberryman re "(SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;"

Your Order BY is in the wrong place. It should be after the implicit join:
(SELECT * FROM table1 WHERE col = 264) t, (SELECT @row_num:=0) var
ORDER BY id;

and you might find the column order matters (remember you're operating in a grey area of MySQL functionality) so it's good not to stray to far from the original example
"SELECT t.id, @row_num:=@row_num+1 AS row_number"

Unknown said...

please give some example table

Anonymous said...

It is indeed a solution. It doesnot work when ranking is in the span of an interval.

My data is
user id timestamp.

if a user comes 10 times to my website i will have user id and 10 timestamps in 10 rows.
I need to group all the user events in one label where the time of events was within an hour period..
the query in mssql is
SELECT *,DENSE_RANK() OVER(PARTITION BY user1 ORDER BY DATEPART(HH,timestamp)) FROM #temp.

Could we have a corresponding query in mysql.

Please help i am stuck for hours here.