The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 01:06:18 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: file_get_contents error  (Read 2680 times)
NYDAz
Expert
****
Offline Offline

Posts: 212

The Night Stalker


View Profile
« on: March 30, 2009, 11:22:54 AM »

hi friends !

i'm trying to build a keyword scrapper based on ask !

but when I try to run the scrape function i get an error like this

Code:
[function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 500 Internal Server Error

please notice that i'm using it on my wamp server !

what can be wrong ? 
Logged

what's up?
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: March 30, 2009, 11:44:06 AM »

Looks like either the http:// file types are not available to you at your host, or, a 500 which is the catchall HTTP error means that Ask sees you are a bot and is denying you access.

I'd put my money on Ask denying you.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
NYDAz
Expert
****
Offline Offline

Posts: 212

The Night Stalker


View Profile
« Reply #2 on: March 30, 2009, 11:52:16 AM »

Looks like either the http:// file types are not available to you at your host, or, a 500 which is the catchall HTTP error means that Ask sees you are a bot and is denying you access.

I'd put my money on Ask denying you.

now I'm starting to get what is all about !

I'm trying to learning php by writing(understand) some scripts !

this one was one of esrun's ask scrapper !

So I guess it's not working anymore or should I look for other ways to let Ask me in the system  Cry

I should try to get going ! Any tips perk ?  Nerd
Logged

what's up?
leaferz
Rookie
**
Offline Offline

Posts: 19


View Profile
« Reply #3 on: March 30, 2009, 12:05:41 PM »

Looks like either the http:// file types are not available to you at your host, or, a 500 which is the catchall HTTP error means that Ask sees you are a bot and is denying you access.

I'd put my money on Ask denying you.

now I'm starting to get what is all about !

I'm trying to learning php by writing(understand) some scripts !

this one was one of esrun's ask scrapper !

So I guess it's not working anymore or should I look for other ways to let Ask me in the system  Cry

I should try to get going ! Any tips perk ?  Nerd

I'm at the same point as well NYDAz.  Grin

I ended up saving a few pages from each search engine and places them on my server. Then practice scraping those saved pages. Once all goes well I'll add a sleep cycle and try it live.

I just finished the big three and ask is next. It seems they used JS to output their Related Searches. Scraping javascript output is another hurdle I need to pass. I've seen a few projects around which use the mozilla engine to capture the JS output (XULRunner and Crowbar - http://www.skillett.com/index.php/688/xulrunner-and-crowbar-crawling-of-sorts) but I haven't had the chance to give it a test run.
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: March 30, 2009, 12:06:24 PM »

I should try to get going ! Any tips perk ?  Nerd
Try your scraper on a site that you know is willing to talk to you, like one of your own sites. If you get content back, it's definitely Ask.

What you are talking about doing is one of the BH's best tricks ... convincing a server that you are a human rather than a bot and giving up its HTML payload. Since you are just learning PHP, this is a rather daunting task. I'd start with something a little easier, because there are a whole lot of variables here.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
NYDAz
Expert
****
Offline Offline

Posts: 212

The Night Stalker


View Profile
« Reply #5 on: March 30, 2009, 12:10:48 PM »

Try your scraper on a site that you know is willing to talk to you, like one of your own sites. If you get content back, it's definitely Ask.

What you are talking about doing is one of the BH's best tricks ... convincing a server that you are a human rather than a bot and giving up its HTML payload. Since you are just learning PHP, this is a rather daunting task. I'd start with something a little easier, because there are a whole lot of variables here.

Thanks for the info ! Will report back if my sites will talk to me 
Logged

what's up?
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #6 on: April 01, 2009, 11:57:48 AM »

Yes it's trivially easy for the server on the other end to tell you're a script and not a human when you use the builtin functions to grab a webpage. Try a browser emulation library like cURL, Pear's HTTPRequest class and HTTPClient class, Mechanize (for Perl), etc.

I think Python has some pretty sweet browser emulation libs as well.
Logged

hai
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!