Fetch a Random Row from the Database Without Using Triggers, and Quickly

fusionbox

Posted by fusionbox

django

First: the don'ts.

  1. Don't use ORDER BY RAND(). It's super slow on tables with lots of rows (lots = 2)
  2. Don't use WHERE id = (SELECT FLOOR(RAND() * (SELECT MAX(id) FROM table)). If you've got any gaps, the gaps will result in a skewed probability.
  3. Avoid triggers that maintain a hash-like lookup table. This solves the id gap problem, but now you've got another table to maintain and you can't use a MyISAM engine. Besides, “Triggers? Boo!” is good policy when working in a web-based MVC framework, which I am.

So what to do? Let's look at a half-decent PHP implementation (this is gonna use an Orm that wraps PDO, so don't be surprised if you don't recognize the classes and methods).

$ids = Query::fetchOnly('SELECT id FROM tips'); // that's right, all of 'em!
$random_id = $ids[ array_rand($ids) ];
$row = Query::fetchOne('SELECT * FROM tips WHERE id = ?', $random_id);

This is actually pretty fast – to a point (100,000s of rows => problem), but the idea is sound.  Select a random id, and fetch that row.  How can we do this using MySQL?

Hmm... humm. AH! OFFSET! That's pretty much the same thing, right?

$count = Query::fetchOnly('SELECT COUNT(*) FROM tips');
$random_offset = rand(0, $count) - 1;
$row = Query::fetchOne('SELECT * FROM tips LIMIT 1 OFFSET '. (int)$random_offset);

We are essentially doing the same process we did above - fetching by a random offset - but instead of a php array, we're looking at the entire result set.

And that should do it!

This doesn't tackle the problem of selecting more than one row, and if you want to join on a has_many or many_to_many relationship, you'll need to take a few more steps (use a subquery!).

Return to Articles & Guides