
![]() |
Caligula
Well this is what I have been working on... I want to thank everyone here and at the Syndk8....(same people, I know
![]() learn.Simple spider, just collects clean URLs and spiders for new. It requires a database, I created mine manually through phpMyadmin - because its easier and I don't trust the script to do it correctly.ID->INT->auto_increment->PRIMARY Key urls->VARCHAR(100)->UNIQUE Key Those are the only fields it uses - the unique key keeps duplicate URLs from filling up the DB. * At around 35 seconds the spider tends to stop - now I haven't figured out if this is because the window times out or because the spider dead ends BUT it will continue to return data! In the test runs, it continued to spider and return URLs for 15 mins After the script had essentially stopped collecting approx. 2,334 URLs. * The JavaScriptat the beginning is a live timer - Its the result of about 2 hours worth of JSprogrammingexperience and is there to allow me to time the run and measure results. The spider will run without it.<HTML> <HEAD> <style>span{cursor:pointer;color:white;background:black;}</style> <script type="text/ javascript">var msec=0 var sec=0 var min=0 function start(){ document.forms[0].display.value=min+":"+sec+":"+msec go=setTimeout("start()",1) msec++ if(msec==100){ msec=0 sec++ } if(sec==60){ sec=0 min++ }} function stopspider(){ clearTimeout(go); } function over(color) {document.getElementById('over').style.background=color} function out(color) {document.getElementById('over').style.background=color} </script> </head> <body> <div align="center" style="width:10em;position:absolute;left:300px;top:4em;"> <form><input type="text" name="display"size="22"value="00:00:00"></form> <script language=" JavaScript">start();</script><span id="over" onmou seover="over('red')" onmouseout="out('black')" onclick="stopspider(window.stop())">Stop Spider!</span><br><br> <? php// Spider Build v1.04 Beta CBBW¤SB.O // Keep Script From Timing Out set_time_limit(0); while ($i<=100){ echo "$i"; sleep(25); $i++;} // Main Connect To DB $db=mysql_connect ("localhost", "USER", "PASS" ![]() ('I cannot connect to the database because: ' . mysql_error()); mysql_select_db (" ![]() $table = "urls"; $key = keyword;// Keyword You're Searching For $stn = 0;// Number of Results To Start At $grab = 10;// Number of Results To Grab // Start The Spider $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"http://www.google.com/search?safe=off&q=$key&start=$stn&num=$grab&sa=N"); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); $result=curl_exec ($ch); curl_close ($ch); if( $result ){ preg_match_all( '/(http://w{3}.[^.]+?.[a-z]{3})/', $result, $output, PREG_SET_ORDER ); foreach( $output as $item ){ // Write Initial Data mysql_select_db (" ![]() $sqlquery = "INSERT INTO $table VALUES('$id','$item[1]')"; $results = mysql_query($sqlquery); // Spider mysql_select_db (" ![]() $spider = mysql_query("SELECT * FROM $table"); while($row = mysql_fetch_array( $spider )) { foreach( $row as $spurl){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"$spurl"); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); $result2=curl_exec ($ch); curl_close ($ch); if( $result2 ){ preg_match_all( '/(http://w{3}.[^.]+?.[a-z]{3})/', $result2, $output2, PREG_SET_ORDER ); foreach( $output2 as $item2 ){ // Write Data Into The Database mysql_select_db (" ![]() $sqlquery2 = "INSERT INTO $table VALUES('$id','$item2[1]')"; $results = mysql_query($sqlquery2); $see = mysql_query("SELECT urls FROM $table"); while($row2 = mysql_fetch_array( $see )) { /* echo ("<div style="text-align:left;">"); echo $row2["urls"]; echo ("<br />"); echo ("</div>"); */ }}}}}}} ?> </div> </body> </html> ![]() Edit: Code Updated itchy
cool caligula i'll give this a whirl tomorrow as i'm in the spider writing testing phase of my current
curve as well. |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads