perkiset

Here is a little routine I use to pull down the spiderSpy table and update my MySQL database. It is designed to be run from a command prompt (I run this via a cron job) but it could be activated via a web call, provided your

Apache

  will let a pretty long process run (it's taken as long as 3 minutes for me to get the whole thing at times) and the

Apache

  daemon has write access to your botbase.download file - although this is simply a backup mechanism and not required.

Note that as it is written, it requires my class.dbconnection.

php

  library which is also available here in the

PHP

  repository.

Enjoy!
/p


<?

php

 

/*
This little routine pulls the spiderSpy database down from fantomaster
line by line, and if it is an address line, updates a table in a database.
It then removes any spider records that have been removed from their list.

The table that this routine expects can be built with the following SQL:

CREATE TABLE spiders (
  address varchar(16) NOT NULL,
  lupdate datetime NOT NULL,
  `engine` varchar(12Applause NOT NULL,
  PRIMARY KEY  (address)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

*/

$classPath = '/www/sites/lib/classes';
$botFile = '/www/resources/botbase.download';
$fantomasterURL = 'http://userandpassword@fantomaster.com/dardanelles/registerdb/fabotbasecsv_xxl.cgi';
$dbHost = '127.0.0.1';
$dbUser = 'theuser';
$dbPass = 'thepass';
$dbName = 'thedatabase';

file_put_contents($botFile, '');
$search = array('"', " ", " ");
$now = date('Y-m-d H:i:s', time());

require("$classPath/class.dbconnection.

php

 ");
$db = new dbConnection($dbHost, $dbUser, $dbPass, $dbName);

if (($handle = fopen($fantomasterURL, 'r')) === FALSE)
die ('Cannot open Fantomaster');

while ($thisLine = fgets($handle))
{
if (preg_match('/[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}"/', $thisLine))
{
file_put_contents($botFile, $thisLine, FILE_APPEND);
echo '.';

$parts = explode(',', $thisLine);
$engine = str_replace($search, '', $parts[0]);
$address = mysql_escape_string(str_replace($search, '', $parts[3]));

// Here's the google re-inclusion - do this only
// if you are comfortable cloaking Google...
if (preg_match('/google/i', $engine))
$address = str_replace('#', '', $address);

if (substr($address, 0, 1) <> '#')
$db->query("replace into spiders(address, lupdate, engine) values('$address', '$now', '$engine')");

} else echo "x";
}

$db->query("delete from spiders where lupdate<'$now'");

?>

dink

Short, sweet, and to the point.  As usual.

Thanks Perk. Applause

craw

thanks


Perkiset's Place Home   Politics @ Perkiset's