Here is a little routine I use to pull down the spiderSpy table and update my MySQL database. It is designed to be run from a command prompt (I run this via a cron job) but it could be activated via a web call, provided your Apache will let a pretty long process run (it's taken as long as 3 minutes for me to get the whole thing at times) and the Apache daemon has write access to your botbase.download file - although this is simply a backup mechanism and not required.
Note that as it is written, it requires my class.dbconnection.php library which is also available here in the PHP repository.
Enjoy!
/p
<?php
/*
This little routine pulls the spiderSpy database down from fantomaster
line by line, and if it is an address line, updates a table in a database.
It then removes any spider records that have been removed from their list.
The table that this routine expects can be built with the following SQL:
CREATE TABLE spiders (
address varchar(16) NOT NULL,
lupdate datetime NOT NULL,
`engine` varchar(128) NOT NULL,
PRIMARY KEY (address)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
*/
$classPath = '/www/sites/lib/classes';
$botFile = '/www/resources/botbase.download';
$fantomasterURL = 'http://userandpassword@fantomaster.com/dardanelles/registerdb/fabotbasecsv_xxl.cgi';
$dbHost = '127.0.0.1';
$dbUser = 'theuser';
$dbPass = 'thepass';
$dbName = 'thedatabase';
file_put_contents($botFile, '');
$search = array('"', "\n", "\r");
$now = date('Y-m-d H:i:s', time());
require("$classPath/class.dbconnection.php");
$db = new dbConnection($dbHost, $dbUser, $dbPass, $dbName);
if (($handle = fopen($fantomasterURL, 'r')) === FALSE)
die ('Cannot open Fantomaster');
while ($thisLine = fgets($handle))
{
if (preg_match('/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"/', $thisLine))
{
file_put_contents($botFile, $thisLine, FILE_APPEND);
echo '.';
$parts = explode(',', $thisLine);
$engine = str_replace($search, '', $parts[0]);
$address = mysql_escape_string(str_replace($search, '', $parts[3]));
// Here's the google re-inclusion - do this only
// if you are comfortable cloaking Google...
if (preg_match('/google/i', $engine))
$address = str_replace('#', '', $address);
if (substr($address, 0, 1) <> '#')
$db->query("replace into spiders(address, lupdate, engine) values('$address', '$now', '$engine')");
} else echo "x";
}
$db->query("delete from spiders where lupdate<'$now'");
?>