The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 06:57:02 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Rand() alternatives discussion  (Read 4527 times)
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« on: March 17, 2008, 10:19:06 AM »

So there was a question on syndk8 about how to grab random rows from mysql.

my immediate response, because I didnt read the fine print, was order by rand(). of course in a ginormic db this would suck.

some alternatives where presented of get max(id) and generate a random number, then grab that row by id from the database. This works if you want a random number of rows picked from ALL the rows of the table. however....

What if you have a table, which contains relational data. for example

product - domain
prod1       site1.com
prod2       site1.com
prod1       site2.com
prod3       site2.com

I want to select a random row of one of the prod1's....
This is way oversimplified but you get the point.
Think millions of records, with at least 1000's of products and hundreds of domains all having massive overlap so a product might be on 50 sites, or only on 1.

You cant do the "select max(id)" method, because you will randomly get a row that is most likely NOT matching your other criteria.

I'm all ears, and Im sure im gonna feel like a tard after this one...
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: March 17, 2008, 11:15:13 AM »

select rand row WHERE (the join is successful) all done. make sense?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #2 on: March 17, 2008, 11:53:10 AM »

Thats using rand() still though, and could get bad if the result of the join is huge.
my gut is that there is no other way really.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: March 17, 2008, 12:30:36 PM »

No no... what I meant was to do the (max rows, rand number, select THAT id) then add more clause to the where...

get the number of rows where site = 2 (if that's what you're trying to do)
select count('x') from table where site=2

... perhaps you load all IDs where site = 2 into an array and then use that instead of doing it just db methinks as I'm typing... whenever you add to your products table, rebuild an array of all IDs for each site and store it as a serialized array - put the burden of time on the modification side of the transaction rather than the seek. Then load the array (for site = 2) and get [n] number of random items from it ... or something ... I dunno. I'm exhausted ATM but there's lots of ways to skin that cat...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #4 on: March 17, 2008, 02:10:35 PM »

From my understanding this is quite a tough problem - without loading all the rows satisfying the join or any where conditions into an array (even if its just the id number) you effectively can't correctly select a random valid id.

I can think of 2 solutions, neither of which im competent enough to pull off sorry lol:

1) Create a view, or temp table with a standard autoincrement, then randomly select in the usual way off that new field.

2) Create a user defined function to act as a sequencer - then count() number of rows satisfying join, randomly select a number and user the sequencer column to grab that row.

DM
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: March 17, 2008, 02:15:21 PM »

I forgot - there's a nifty trick in MySQL that will handle this perfectly.

See: if you make your primary key the combination of two fields, one of which is an autoincrement and the other is ... well, something else, then the autoincrement value will be contiguous based on the secondary field.

In other words: if you have a PK that is based on site and the autoinc, then you'd get this as you inserted:

Site, id
site1 - 1
site1 - 2
site1 - 3
site2 - 1
site1 - 4
site2 - 2

and so forth. Therefore, you could easily get the max (for a particular site) and then randomize for that.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!