nutballs

So there was a question on syndk8 about how to grab random rows from mysql.

my immediate response, because I didnt read the fine print, was order by rand(). of course in a ginormic db this would suck.

some alternatives where presented of get max(id) and generate a random number, then grab that row by id from the database. This works if you want a random number of rows picked from ALL the rows of the table. however....

What if you have a table, which contains relational data. for example

product - domain
prod1      site1.com
prod2      site1.com
prod1      site2.com
prod3      site2.com

I want to select a random row of one of the prod1's....
This is way oversimplified but you get the point.
Think millions of records, with at least 1000's of products and hundreds of domains all having massive overlap so a product might be on 50 sites, or only on 1.

You cant do the "select max(id)" method, because you will randomly get a row that is most likely NOT matching your other criteria.

I'm all ears, and Im sure im gonna feel like a tard after this one...

perkiset

select rand row WHERE (the join is successful) all done. make sense?

nutballs

Thats using rand() still though, and could get bad if the result of the join is huge.
my gut is that there is no other way really.

perkiset

No no... what I meant was to do the (max rows, rand number, select THAT id) then add more clause to the where...

get the number of rows where site = 2 (if that's what you're trying to do)
select count('x') from table where site=2

... perhaps you load all IDs where site = 2 into an array and then use that instead of doing it just db methinks as I'm typing... whenever you add to your products table, rebuild an array of all IDs for each site and store it as a serialized array - put the burden of time on the modification side of the transaction rather than the seek. Then load the array (for site = 2) and get <> number of random items from it ... or something ... I dunno. I'm exhausted ATM but there's lots of ways to skin that cat...

DangerMouse

From my understanding this is quite a tough problem - without loading all the rows satisfying the join or any where conditions into an array (even if its just the id number) you effectively can't correctly select a random valid id.

I can think of 2 solutions, neither of which im competent enough to pull off sorry lol:

1) Create a view, or temp table with a standard autoincrement, then randomly select in the usual way off that new field.

2) Create a user defined function to act as a sequencer - then count() number of rows satisfying join, randomly select a number and user the sequencer column to grab that row.

DM

perkiset

I forgot - there's a nifty trick in MySQL that will handle this perfectly.

See: if you make your primary key the combination of two fields, one of which is an autoincrement and the other is ... well, something else, then the autoincrement value will be contiguous <i>based on the secondary field.</i>

In other words: if you have a PK that is based on site and the autoinc, then you'd get this as you inserted:

Site, id
site1 - 1
site1 - 2
site1 - 3
site2 - 1
site1 - 4
site2 - 2

and so forth. Therefore, you could easily get the max (for a particular site) and then randomize for that.


Perkiset's Place Home   Politics @ Perkiset's