Caligula

Well this is what I have been working on... I want to thank everyone here and at the Syndk8....(same people, I know Applause)....for all their help. This probably wont be much to you pros, but hey if any noobs happen by maybe it will help them

learn

 .

Simple spider, just collects clean URLs and spiders for new.

It requires a database, I created mine manually through

php

 Myadmin - because its easier and I don't trust the script to do it correctly.

ID->INT->auto_increment->PRIMARY Key
urls->VARCHAR(100)->UNIQUE Key

Those are the only fields it uses - the unique key keeps duplicate URLs from filling up the DB.

* At around 35 seconds the spider tends to stop - now I haven't figured out if this is because the window times out or because the spider dead ends BUT it will continue to return data! In the test runs, it continued to spider and return URLs for 15 mins After the script had essentially stopped collecting approx. 2,334 URLs.

* The

JavaScript

  at the beginning is a live timer - Its the result of about 2 hours worth of JS

programming

  experience and is there to allow me to time the run and measure results. The spider will run without it.




<HTML>
<HEAD>
<style>span{cursor:pointer;color:white;background:black;}</style>
<script type="text/

javascript

 ">
var msec=0
var sec=0
var min=0
function start(){
document.forms[0].display.value=min+":"+sec+":"+msec
go=setTimeout("start()",1)
msec++
if(msec==100){
msec=0
sec++
}
if(sec==60){
sec=0
min++
}}
function stopspider(){
clearTimeout(go);
}
function over(color)
{document.getElementById('over').style.background=color}
function out(color)
{document.getElementById('over').style.background=color}
</script>
</head>
<body>
<div align="center" style="width:10em;position:absolute;left:300px;top:4em;">
<form><input type="text" name="display"size="22"value="00:00:00"></form>
<script language="

JavaScript

 ">start();</script>
<span id="over" onmou

seo

 ver="over('red')" onmou

seo

 ut="out('black')" onclick="stopspider(window.stop())">Stop Spider!</span>
<br><br>
<?

php

 
// Spider Build v1.04 Beta CBBW¤SB.O

// Keep Script From Timing Out
set_time_limit(0);
while ($i<=100){
echo "$i";
sleep(25);
$i++;}

// Main Connect To DB
$db=mysql_connect ("localhost", "USER", "PASS"Applause or die
('I cannot connect to the database because: ' . mysql_error());
mysql_select_db ("ApplauseB NAME", $db);
$table = "urls";

$key = keyword;// Keyword You're Searching For
$stn = 0;// Number of Results To Start At
$grab = 10;// Number of Results To Grab

// Start The Spider
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.google.com/search?safe=off&q=$key&start=$stn&num=$grab&sa=N");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);

if( $result ){

  preg_match_all( '/(http://w{3}.[^.]+?.[a-z]{3})/', $result, $output, PREG_SET_ORDER );
  foreach( $output as $item ){

// Write Initial Data
mysql_select_db ("ApplauseB NAME", $db);
$sqlquery = "INSERT INTO $table
VALUES('$id','$item[1]')";
$results = mysql_query($sqlquery);

// Spider
mysql_select_db ("ApplauseB NAME", $db);
$spider = mysql_query("SELECT * FROM $table");

while($row = mysql_fetch_array( $spider )) {

foreach( $row as $spurl){
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"$spurl");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result2=curl_exec ($ch);
curl_close ($ch);
 
if( $result2 ){
 
  preg_match_all( '/(http://w{3}.[^.]+?.[a-z]{3})/', $result2, $output2, PREG_SET_ORDER );
  foreach( $output2 as $item2 ){

// Write Data Into The Database
mysql_select_db ("ApplauseB NAME", $db);
$sqlquery2 = "INSERT INTO $table
VALUES('$id','$item2[1]')";
$results = mysql_query($sqlquery2);
$see = mysql_query("SELECT urls FROM $table");

while($row2 = mysql_fetch_array( $see )) {

/*
    echo ("<div style="text-align:left;">");
    echo $row2["urls"];
    echo ("<br />");
echo ("</div>");
*/

}}}}}}}

?>

</div>
</body>
</html>




Applause



Edit: Code Updated

itchy

cool caligula i'll give this a whirl tomorrow as i'm in the spider writing testing phase of my current

learn

 ing  curve as well.
let you know how i get on.

Caligula

quote author=itchy link=topic=206.msg1317#msg1317 date=1179114289

cool caligula i'll give this a whirl tomorrow as i'm in the spider writing testing phase of my current

learn

 ing  curve as well.
let you know how i get on.



Cool itchy...  one thing - the script above times out the browser... in fact I just got done taking care of that lil problem - special thanks to TD

*edit - fixed code timeout problem... updated above.


It will print a number 1...2...3...4..every 25 sec or whatever... apparently the browser needs to have data written to it to keep it from timing out... ( since I keep the echos commented out for faster spidering speed )

Let me know how your test run goes.....  Applause



thedarkness

Guys, if you're not doing any output and you don't take any input......

If you have shell access just run it from the command line with;

php

  index.

php

 

it will hose your js but from memory that was just a timer? If so;
time

php

  index.

php

 

Oh, assumes you are on a

linux

  type system.

Cheers,
td

Bompa

quote author=thedarkness link=topic=206.msg1341#msg1341 date=1179139671

Guys, if you're not doing any output and you don't take any input......



No input, no output?  That's my kinda script.  LOL


Applause

Caligula

Heyyyyy there is output.... Applause it records all the URLs to a database... but for some reason the browser times out if it doesn't get anything... so this way it just prints a number every few seconds - so that the browser doesn't time out or slow the script down...(echoing the URLs hogs resources) I know this is shit to you pros... but to those of us just getting started with

php

 ...its a

learn

 ing  tool... That script is like my "notes from

php

  class" ....

I don't run any of this stuff from a command line.....I believe what you are talking about I would need to have

php

  installed on my computer.... I do not.

I have a super secret website which has all my little projects hidden on it.... Applause

thedarkness

quote author=Bompa link=topic=206.msg1351#msg1351 date=1179143078



No input, no output?  That's my kinda script.   LOL




You jerkin' my chain Bomps?

What I meant was it doesn't take any input from the browser and it doesn't have to send any output to the browser so it's a prime candidate to run from the CLI.

Thanks for being a picky bastard though  Applause

Cheers,
td


Perkiset's Place Home   Politics @ Perkiset's