
![]() |
perkiset
Here is a little routine I use to pull down the spiderSpy table and update my MySQL database. It is designed to be run from a command prompt (I run this via a cron job) but it could be activated via a web call, provided your
Apachewill let a pretty long process run (it's taken as long as 3 minutes for me to get the whole thing at times) and theApachedaemon has write access to your botbase.download file - although this is simply a backup mechanism and not required.Note that as it is written, it requires my class.dbconnection. phplibrary which is also available here in thePHPrepository.Enjoy! /p <? php/* This little routine pulls the spiderSpy database down from fantomaster line by line, and if it is an address line, updates a table in a database. It then removes any spider records that have been removed from their list. The table that this routine expects can be built with the following SQL: CREATE TABLE spiders ( address varchar(16) NOT NULL, lupdate datetime NOT NULL, `engine` varchar(12 ![]() PRIMARY KEY (address) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; */ $classPath = '/www/sites/lib/classes'; $botFile = '/www/resources/botbase.download'; $fantomasterURL = 'http://userandpassword@fantomaster.com/dardanelles/registerdb/fabotbasecsv_xxl.cgi'; $dbHost = '127.0.0.1'; $dbUser = 'theuser'; $dbPass = 'thepass'; $dbName = 'thedatabase'; file_put_contents($botFile, ''); $search = array('"', " ", " "); $now = date('Y-m-d H:i:s', time()); require("$classPath/class.dbconnection. php");$db = new dbConnection($dbHost, $dbUser, $dbPass, $dbName); if (($handle = fopen($fantomasterURL, 'r')) === FALSE) die ('Cannot open Fantomaster'); while ($thisLine = fgets($handle)) { if (preg_match('/[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}"/', $thisLine)) { file_put_contents($botFile, $thisLine, FILE_APPEND); echo '.'; $parts = explode(',', $thisLine); $engine = str_replace($search, '', $parts[0]); $address = mysql_escape_string(str_replace($search, '', $parts[3])); // Here's the google re-inclusion - do this only // if you are comfortable cloaking Google... if (preg_match('/google/i', $engine)) $address = str_replace('#', '', $address); if (substr($address, 0, 1) <> '#') $db->query("replace into spiders(address, lupdate, engine) values('$address', '$now', '$engine')"); } else echo "x"; } $db->query("delete from spiders where lupdate<'$now'"); ?> dink
Short, sweet, and to the point. As usual.
Thanks Perk. ![]() craw
thanks
|

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads