nutballs

My goal in life is to fill this forum with an enormous number of "why doesnt this shit work" type questions...

So...

This gets a result:

$url= "http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee";
$xml = new SimpleXMLElement(file_get_contents($url));
print_r($xml);


This gets no result:

$url= "http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee";
$req = new webrequest();
print_r($req->simpleGet($url));


seriously, this makes my brian hurt. every other site i hit works. but yahoo returns nothing. status of 0 (zero).

nutballs

to add to this. this is the response I get back from debug:

simpleGet: Starts with [http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee]
Outbound Header:
GET /WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee HTTP/1.1
Host: api.search.yahoo.com
User-Agent: Mozilla/5.0 (

Mac

 intosh; U; PPC

Mac

  OS X; en)

Apple

 WebKit/417.9 (KHTML, like Gecko) Safari/417.8
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding:
Accept-Charset: ISO-8859-1,utf-8:q=0.7,*;q=0.7
Connection: close

Content-Type: text/html
Content-Length: 0

Default beforeExecute()
Execute: Starts
Execute: HostStr=[api.search.yahoo.com] Port:80
Execute:

PHP

 5 or greater, setting timeout of 30
Execute: Sending request
Execute: Request Sent
GetChunk: Starts
getChunk: Remote Timeout
handleProxyRetry()
Execute: Failed - Did not receive anything from remote host
handleFailure()
Default afterExecute()
simpleGet Failed
Array
(
)

perkiset

... so you're saying that the same code, but different URL works just fine?

If that's the case, then there is something in the header that Yahoo is expecting... if you can, sniff the outbound request done both ways. Clearly, since it works with other URLs then the webRequest works... but Yahoo must be expecting something in the header, and having not seen it just stops on you.

Also, you create a webrequest() rather than a webRequest2() ... is that a
typo
  or did you rename it? webrequest() is the OLD class, webRequest2 is the newest.

nutballs

Webrequest works fine with other URLs.
The yahoo URL is the same for both methods, xml/file_get_contents or webrequest/simpleget.

I was thinking, how the hell do I sniff the file_get_contents request, but realized, the browser actually is capable of getting the yahoo URL, so, i can sniff that and compare.

yea, i renamed it webrequest.

i will check the headers and post back.

nutballs

ok there is no difference, other than keepalive stuff.
there is a difference with the charset as well, but i tested that, using the string you see below, which came from firefox.

is there a way to send the keepalive? or is there a way to add custom header lines to the webrequest?


(Request-Line):GET /WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&results=100&adult_ok=1&query=coffee HTTP/1.1
Host:api.search.yahoo.com
User-Agent:Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language:en-us,en;q=0.5
Accept-Encoding:gzip,deflate
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive:300
Connection:keep-alive

perkiset

If you look over the code, the buildHeader() function is where everything is constructed - you can put anything there you want. The header as it is is the result of my research of headers that I like. It should be pretty self explanatory.

But beyond that, is it possible that you need to float up a cookie to them or something? Perhaps I'll ping you in the AM if I have a moment and do some testing on it myself...

DangerMouse

This is a bit of a blind guess, but could it be something to do with the 'accept-encoding'?

From the Firefox header, the content seems to be sent in gzip format, for the browser to deflate and display. However you'd have thought that it would only send content like this where the request header states that it can cope with it, from what I can see WebRequest2 leaves the "Accept-Encoding" header blank.

DM

nutballs

bah didnt notice that one was blank i will try that...

nope still no joy.

and i just tried tacking the keep alive stuff on the end, and still no worky.

DangerMouse

Just had a quick bash at this with curl, and forced the 'different' headers to the values that WebRequest2 is sending (Content-Encoding: {blank} and Connection: Closed) and still managed to recieve the content, so maybe the problem is not with the headers?

Haven't got any good suggestions I'm afraid, it says something in the WebRequest2 debug log about 'Remote Timeout' (whatever that may be), it oculd be that the problem is with translating the chunked encoding?

DM

perkiset

Nuts - I had one server I was talking to that never sent an "eof" when the page was complete and I got a zero back. The way that I figured out what was going on was to write the app for the console, then added an echo line right after the fread in the getChunk function - essentially, the moment that I received ANYTHING from the remote server it is echoed back out to the console.

Why it is important to debug it this way, is that you take the unknowns of the server and your browser out of it - you'll see exactly what is being received at the moment you get it. I can't tell you how many times the details of the server and the browser have thwarted my efforts to see exactly what was the deal.

So if you add "echo $thisBuff;" right after the fread() line, you'll be able to debug the connection at packet level - or at least at the *contents* of a packet level, not a true sniff...

nutballs

nadda, zippo, zilch.

nothing returns. I figured that as well, because i used the new feature to test for "</rss>", and still no joy.


Perkiset's Place Home   Politics @ Perkiset's