And the results are in!
In my final test, I changed the database thusly:
CREATE TABLE `ipseektest` (
`ipstring` varchar(16) NOT NULL,
`ipnum` bigint(20) NOT NULL,
`tiny1` smallint(6) NOT NULL,
`tiny2` smallint(6) NOT NULL,
`tiny3` smallint(6) NOT NULL,
`tiny4` smallint(6) NOT NULL,
KEY `ipstring` (`ipstring`),
KEY `ipnum` (`ipnum`),
KEY `tiny1` (`tiny1`,`tiny2`,`tiny3`,`tiny4`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Note the addition of 4 new fields, tiny1-4 each of which are SMALLINTs. I could not use a TINYINT in MySQL because that is a signed 4 bytes – which should be plenty large enough, but I could only put -127 – 127 in it. Very peculiar. A Small is only 6 bytes, so the difference was pretty small.
Then I took the exact same addresses and placed them into the octet fields of the table ie., if the ipstring was 127.0.0.1, then the ipnum is 127000000001 and tiny1 is 127, tiny2 is 0 tiny3 is 0 and tiny4 is 1. I then indexed as you can see above.
The code I ran is as follows:
#! /usr/local/bin/php
<?php
require_once('/www/sites/lib/classes/class.dbconnection.php');
$start = mtime();
print "\n\n\n";
// JUST GETTING THE FILE OF ADDRESSES
$rawBuff = file_get_contents('./addresses.imp');
elapsed("Load Imploded");
$ipStrings = explode(chr(10), $rawBuff);
elapsed("Process Imploded - count=" . count($ipStrings));
$db = new dbConnection('127.0.0.1', 'testuser', 'testpass', 'test');
// RANDOMIZING AND CREATING A LOOKUP LIST
print "Preparing array\n";
shuffle($ipStrings);
$ints = array();
$ptr = 0;
foreach($ipStrings as $address)
{
preg_match('/([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})/', $address, $octets);
$ints[$ptr][0] = $octets[1];
$ints[$ptr][1] = $octets[2];
$ints[$ptr][2] = $octets[3];
$ints[$ptr][3] = $octets[4];
$ptr++;
if ($ptr > 100000) { break; }
}
elapsed("Complete");
// HERE'S THE JUICE
print "starting string query\n";
for ($i=0; $i<100000; $i++)
{
// Line broken here for readability - wasn't in the real code
$db->query("select * from ipseektest where tiny1={$ints[$i][0]} and tiny2={$ints[$i][1]} and
tiny3={$ints[$i][2]} and tiny4={$ints[$i][3]}");
if ($i % 1000 == 0) echo '.';
}
elapsed('Complete');
function elapsed($msg)
{
global $start;
$elap = mtime() - $start;
echo "$msg: $elap\n";
$start = mtime();
}
function mtime()
{
list($usec, $sec) = explode(' ', microtime());
return ((float)$usec + (float)$sec);
}
?>
And the result is…
Load Imploded: 0.0286769866943
Process Imploded - count=500000: 0.307203054428
Preparing array
Complete: 1.24323010445
starting string query
. . . . . . . . . [ clipped for readability ]
Complete: 34.0186672211
Just a smidge faster than the string query – about 3.1% faster – hard to make a case that it's worth the effort or extra storage space (we’re now up to 16Megs per million records for storage).
Your point in the other thread is really important: how is this used in the real world? That's why I pushed after the C conversion solution - there just isn't a pretty and scripted way to handle that conversion really efficiently. Using the C addon to PHP we can eliminate that bottleneck and busy work and let the DB get at what it does best, IMO.
/p