The directory trick is genius. I like it. Other than APC obviously not being installed on all if any shared hosting accounts , was there any other reasons why your not taking that route?
This method NBs and I worked through for quite a while as a workaround method for really,really bad hosts that don't have many of the contemporary tools. Why someone would ever pick a host like that is beyond me

But I digress. This method is quick enough, but today will take up close to 30K inodes on a box... so the problem here is if a host won't let you go that route. An alternate we came up with was to do the first 3 octets as directories and then the last octet as a file that you would load, or better, php include (this is the fastest) so you keep the total number of inodes under what your host will allow - this is particularly important if you have a boatload of pages on a spam site, for example.
The fastest way to do this is to convert the node address (the last octet) and spider name into an array, then store it as a serialized array in a file. Then the "include" php looks something like this: (image you name the file "spiders.txt")
<?php
$spiderArr = unserialize('{... the serialized array ...}');
?>
...that's it. Then when your master file simply needs to do something like this:
$spiderArr = array();
preg_match('/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/', $SERVER['REMOTE_ADDR'], $parts);
$dir = str_replace('.', '/', $parts[0]);
include "$dir/spiders.txt";
At this point, $spiderArr either contains nothing (what you originally set it to) or it has been overwritten by the included file. Since it's "include" not "require" PHP will not stop (it will only complain if you have your error-reporting set too low) - and then you can ask the array to see if the current REMOTE_ADDR is a spider or not.
If you have the RAM, you can do this via APC, although there are many methods and arguments about hashing speed, too many elements in the array, running out of cache RAM and having to shift to disc all the time... it's a pretty big topic. I toyed with it for a while but found that I didn't like any of my solutions, so I passed on APCing the spiderSpy DB. Again, I am now doing almost all my spider IDing, tracking and cloak setup in a single stored procedure, so I'm quite a ways away from this ATM.
Installed APC on my dev server. p4 2.8 w 4g of ram
Loaded all 20K+ ips into any array
Stored it in apc
Then fetched them all in right at 1 second. Damn
The way APC works is that it keeps a copy of <whatever you store> in RAM - this sounds great, but consider this: if you load a 29K spider IP table into an array, then use that array for every page call, you're making another copy of that 29K array for
every instance and every paged called, every time - this is ENORMOUSLY ram and processor intensive - it is a bad way to go. Just sayin.
/p