Best Practices for Random Number Generation

Wither thou goest, pseudorandom number generator? 

I recently implemented a site (The Rosey Awards) that had to support the ability to display entries in a random order.  The reasoning behind this was to give every entry a fair chance to show up on the first gallery page.  You can change the sort order, of course, but it defaults to random.

I implemented it thusly:

  1. The gallery controller receives the request, which includes the page number.
  2. The controller gets all of the entries from the database, shuffles them, skips the first N, where N=EntriesPerPage x (Page-1), and then takes EntriesPerPage entries and returns that to the view.

The astute reader will immediately see the problem here: paging will not work.  If I’m on page one, and I’m looking at a shuffled view of the first page of entries, then I skip over to page two, the entries will all be re-shuffled, and I may see entries I already saw on page one.  Worse, I could go through all the pages and never see a certain entry!

I agonized over how to correct this, but all of my solutions involved stashing the complete order of all entries in the session.  If there was only one user, that would be fine, but when a hundred visitors hit the site, I now have to keep 251 x 100 entries in memory?  So I thought about cutting it down to just entry IDs, but that’s still a lot of memory, and it prevents me from making a simple database query (I have to query for each entry separately and put them in a list myself).

Finally, the solution occurred to me: exploit the nature of the pseudorandom number generator.

There’s no such thing as a random number generator (at least not yet); all of the existing random number generators are actually pseudorandom number generators.  That is, they take a seed, and that seed determines what “random” numbers will come out.  You preserve the illusion of randomness by choosing a different seed each time.

However, there’s now a clear solution to my problem: instead of having to store the random order of 251 entries for each session, all I have to do is store the seed I used to initialize the random number generator.  My memory costs have just been cut by a factor of four thousand!  Now that’s what I call savings.

There’s an even more astute reader out there who sees the other problem with what I’m doing.  Since there’s no particularly efficient way to get rows out of a database in a random order, for every page request, I have to get all entries out of the database and shuffle them in memory.  Maybe not such a big deal for 251 entries, but this solution would not scale well.

If you’re facing the same problem, but you’re dealing with thousands or tens of thousands of rows or more, then you’ll have to have a more sophisticated database-driven solution.  Off the top of my head, I can envision having a table that maps random numbers to row IDs, so you can do a join and sort by the random number.  If the point is to establish fairness (i.e., every time a person changes the sort order to random, they get a new random order), the most efficient solution would probably be to create a pool of such random orders, say 100 different pre-established random orders, and just pick one when the user views it.  It’s one of those tricky situations where you have to balance functionality against serious performance considerations.

 

Originally posted December 5th, 2011.