I have written a script that scrapes websites and checks the amount of proxies on a page. The script is solely for narrowing down a list of urls to add as targets for leeching in future.
The problem is that the script is running out of memory about an hour or so after I run it. Heres the error I'm getting:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 19923967 bytes) in /home/justin/proxyscraper/seeds/scraper.php on line 13
Now it has been my objective to learn a language such as python that could do this much faster (threads) but at the moment I'm just trying to get something workable with the coding knowledge I have. I think if I try to allocate more memory to the script the same thing will happen as it barely made a dent on the huge list of sources that I have.
Now I'm not much of a coder but basically I can hack something together so it works for me. If anyone has any ideas I'm all ears.
//Read all the sources to an array
$lines = file("proxysource.txt");
$lines = array_unique($lines);
$success = array();
$num = count($lines);
$num = $num - 1;
$regex = '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}[:][0-9]{1,5}/';
$i = 0;
//Scrape each page and any source that has more than 10 proxies add to array
while($i <= $num){
$result = scrapepage($lines[$i]);
preg_match_all($regex, $result, $matches);
if (count($matches[0]) >= 10){
array_push($success,$lines[$i]);
echo $lines[$i] . "\n";
}
$i++;
}
$m = count($success);
$m = $m - 1;
$n = 0;
//Save the succesful sites to a file
while($n <= $m){
$myFile = "goodsource.txt";
$fh = fopen($myFile, 'a') or die("can't open file");
fwrite($fh, $success[$n]);
fclose($fh);
$n++;
}
?>