The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 09:49:12 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: file_get_contents works, webrequest doesn't - what gives?  (Read 17006 times)
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« on: February 14, 2008, 03:04:44 PM »

My goal in life is to fill this forum with an enormous number of "why doesnt this shit work" type questions...

So...

This gets a result:
Code:
$url= "http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee";
$xml = new SimpleXMLElement(file_get_contents($url));
print_r($xml);

This gets no result:
Code:
$url= "http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee";
$req = new webrequest();
print_r($req->simpleGet($url));

seriously, this makes my brian hurt. every other site i hit works. but yahoo returns nothing. status of 0 (zero).
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #1 on: February 14, 2008, 03:25:15 PM »

to add to this. this is the response I get back from debug:
Code:
simpleGet: Starts with [http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee]
Outbound Header:
GET /WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee HTTP/1.1
Host: api.search.yahoo.com
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding:
Accept-Charset: ISO-8859-1,utf-8:q=0.7,*;q=0.7
Connection: close

Content-Type: text/html
Content-Length: 0

Default beforeExecute()
Execute: Starts
Execute: HostStr=[api.search.yahoo.com] Port:80
Execute: PHP5 or greater, setting timeout of 30
Execute: Sending request
Execute: Request Sent
GetChunk: Starts
getChunk: Remote Timeout
handleProxyRetry()
Execute: Failed - Did not receive anything from remote host
handleFailure()
Default afterExecute()
simpleGet Failed
Array
(
)
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: February 14, 2008, 06:53:52 PM »

... so you're saying that the same code, but different URL works just fine?

If that's the case, then there is something in the header that Yahoo is expecting... if you can, sniff the outbound request done both ways. Clearly, since it works with other URLs then the webRequest works... but Yahoo must be expecting something in the header, and having not seen it just stops on you.

Also, you create a webrequest() rather than a webRequest2() ... is that a typo or did you rename it? webrequest() is the OLD class, webRequest2 is the newest.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #3 on: February 14, 2008, 07:04:56 PM »

Webrequest works fine with other URLs.
The yahoo URL is the same for both methods, xml/file_get_contents or webrequest/simpleget.

I was thinking, how the hell do I sniff the file_get_contents request, but realized, the browser actually is capable of getting the yahoo URL, so, i can sniff that and compare.

yea, i renamed it webrequest.

i will check the headers and post back.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: February 14, 2008, 07:32:01 PM »

ok there is no difference, other than keepalive stuff.
there is a difference with the charset as well, but i tested that, using the string you see below, which came from firefox.

is there a way to send the keepalive? or is there a way to add custom header lines to the webrequest?

Code:
(Request-Line):GET /WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee HTTP/1.1
Host:api.search.yahoo.com
User-Agent:Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language:en-us,en;q=0.5
Accept-Encoding:gzip,deflate
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive:300
Connection:keep-alive
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: February 14, 2008, 09:46:31 PM »

If you look over the code, the buildHeader() function is where everything is constructed - you can put anything there you want. The header as it is is the result of my research of headers that I like. It should be pretty self explanatory.

But beyond that, is it possible that you need to float up a cookie to them or something? Perhaps I'll ping you in the AM if I have a moment and do some testing on it myself...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #6 on: February 15, 2008, 03:24:52 AM »

This is a bit of a blind guess, but could it be something to do with the 'accept-encoding'?

From the Firefox header, the content seems to be sent in gzip format, for the browser to deflate and display. However you'd have thought that it would only send content like this where the request header states that it can cope with it, from what I can see WebRequest2 leaves the "Accept-Encoding" header blank.

DM
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #7 on: February 15, 2008, 08:25:45 AM »

bah didnt notice that one was blank i will try that...

nope still no joy.

and i just tried tacking the keep alive stuff on the end, and still no worky.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #8 on: February 15, 2008, 09:19:51 AM »

Just had a quick bash at this with curl, and forced the 'different' headers to the values that WebRequest2 is sending (Content-Encoding: {blank} and Connection: Closed) and still managed to recieve the content, so maybe the problem is not with the headers?

Haven't got any good suggestions I'm afraid, it says something in the WebRequest2 debug log about 'Remote Timeout' (whatever that may be), it oculd be that the problem is with translating the chunked encoding?

DM
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: February 15, 2008, 09:41:25 AM »

Nuts - I had one server I was talking to that never sent an "eof" when the page was complete and I got a zero back. The way that I figured out what was going on was to write the app for the console, then added an echo line right after the fread in the getChunk function - essentially, the moment that I received ANYTHING from the remote server it is echoed back out to the console.

Why it is important to debug it this way, is that you take the unknowns of the server and your browser out of it - you'll see exactly what is being received at the moment you get it. I can't tell you how many times the details of the server and the browser have thwarted my efforts to see exactly what was the deal.

So if you add "echo $thisBuff;" right after the fread() line, you'll be able to debug the connection at packet level - or at least at the *contents* of a packet level, not a true sniff...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #10 on: February 15, 2008, 11:09:33 AM »

nadda, zippo, zilch.

nothing returns. I figured that as well, because i used the new feature to test for "</rss>", and still no joy.

Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!