The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 22, 2019, 05:25:19 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: PHP to download a gz file  (Read 1934 times)
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« on: October 25, 2008, 08:16:37 PM »

I am trying to automate something, but getting stuck.

It is a text file on a remote server that is actually gzipped

I cant figure out what the hell I am supposed to use to get the damn text out.

specifically this is fantomaster gzipped CSV of the spiderbase. I tried XML but can't get it to play, so I am going to do the csv, and want the gz so it can go a little faster.

im sure you already do this perk Wink
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: October 25, 2008, 10:49:46 PM »

if it's gzipped, then gunzip it on the command line of a unix prompt.

I pull it ungzipped from Fanto every morning - I don't worry about the download time. Here is the script I use for one of my servers (I pull it down a few different ways at different times) with just my personals replaced. Note that I actually process each line into the DB during the DL, rather than getting the file and then processing the whole thing. The only part that is not entirely obvious is the call to the stored procedure updateSpider - the script for which is posted after the PHP code. If should be reasonably obvious, let me know if I need to explain anything.

/p

Code:
#! /usr/local/bin/php
<?php

$classPath 
'/www/sites/lib/classes';
$fantomasterURL 'http://username:password@fantomaster.com/dardanelles/registerdb/fabotbasecsv_xxl.cgi';

$dbHost '127.0.0.1';
$dbUser 'username';
$dbPass 'password';

file_put_contents($botFile'');
$search = array('"'"\n""\r");
$now date('Y-m-d H:i:s'time());

require(
"$classPath/class.dbconnection.php");
$db = new dbConnection($dbHost$dbUser$dbPass$dbName);

if ((
$handle fopen($fantomasterURL'r')) === FALSE) die ('Cannot open Fantomaster');

$db->query("replace into shared.sysvars(name, value) values('spider_dlstatus', 'downloading')");

$total 0;
$inserted 0;
$blocked 0;
while (
$thisLine fgets($handle))
{
        
$total++;
        if ((
$total 100) == 0)
                
$db->query("replace into shared.sysvars(name, value) values('spider_dlstatus', '$total')");

        if (
preg_match('/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"/'$thisLine))
        {
                
$parts explode(','$thisLine);
                
$engine str_replace($search''$parts[0]);
                
$useragent mysql_escape_string(str_replace($search''$parts[1]));
                
$address trim(str_replace($search''$parts[3]));
                if (
preg_match('/google/i'$engine)) $address str_replace('#'''$address);

                if ((
substr($address01) <> '#') && ($address ' ') && preg_match('/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/'$address))
                {
                        
$inserted++;
                        
$db->query("call shared.updateSpider('$address', '$engine', '$useragent')");
                } else {
                        
$blocked++;
                }
        }
}

$now date('Y-m-d H:i:s'time());
$db->query("replace into shared.sysvars(name, value) values('spider_lupdate', '$now')");
$db->query("replace into shared.sysvars(name, value) values('spider_inserted', '$inserted')");
$db->query("replace into shared.sysvars(name, value) values('spider_blocked', '$blocked')");
$db->query("replace into shared.sysvars(name, value) values('spider_dlstatus', 'idle')");

?>



PROCEDURE updateSpider(addr char(16), eng varchar(255), ua varchar(255))
BEGIN

   declare ipNum integer unsigned;
   declare dummy integer;
   declare needRec integer;
   declare spiderID integer;
   declare engineID integer;
   declare uaID integer;

   declare continue handler for 1062 set dummy=1;
   declare continue handler for not found set needRec=1;
   
   set needRec = 0;
   set ipNum = inet_aton(addr);

   select ip into spiderID from shared.spiders where ip=ipNum;
   if (needRec = 1) then

      insert into shared.engines(caption) values(eng);
      select id into engineID from shared.engines where caption=eng;

      insert into shared.useragents(caption) values(ua);
      select id into uaID from shared.useragents where caption=ua;

      insert into shared.spiders(ip, address, engine_id, ua_id) values(ipNum, addr, engineID, uaID);

   end if;
END
« Last Edit: October 25, 2008, 10:52:41 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #2 on: October 26, 2008, 07:37:10 AM »

thanks perk. yea, I just will give up on the GZ. I was just trying to reduce my footprint where I can, but its no biggy. Thanks for the code.

Heh. I didnt know you could do this:
str_replace($search,' ',$content);
where search is an array.



Hey Fanto, if you read this.
A version of the botbase that I know I could use, and probably others could, is just the IPs. Nothing else. I personally don't care about which bot it is and such, at least in 1 of my apps. just a thought.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!