Have you ever seen pipelining in this way work? I tried my butt off for a while and got nowhere, because the HTTP spec is a strict request/response mechanism (I tried all sorts of clever hacks and such). Apache, for example, does not maintain a notion of an open request line so that you can continually re-request over the same pipe. Consider, for example, just the way that the headers portion of an HTTP request work: Once you send them and start sending content, you cannot send headers anymore. And there is no notion of "Page Tail" so that a new set of headers can be sent down the same pipe. This can be tested simply by telenetting - if you open a connection and attempt to get more than one page it will not work - unless there is magic here that I am unware of (a huge possibility, BTW - I am no expert here).
My personal thing was to pipeline ajax responses - I wanted to build a data flooder and I just couldn't get it to work. My eventual way was to use concurrent requests rather than trying to speed up serialized requests by leaving the pipe open.
Sorry, this was certainly not the answer you were looking for - but perhaps if you want to talk about why you need it to go so much faster we could work through some options. The truth is that the connection/handshake/header/content/close cycle is not all THAT burdened by the connection/handshake/close components, so if you need to go that much faster perhaps we should look at another solution.
/p
Thanks for the well thought out response perks.
I actually got this code to work, and it sped my google SERP scraper up by something like a thousand percent. Also it kept me from getting blocked as fast. I think the big G monitors per connection instead of per request.
Pipelining only works with HTTP/1.1, not 1.0 so that was probably your problem when you tried it. Basically you just issue all your get requests in a row before the 'Connection: close'. I just can't figure out how to make it work through an HTTP proxy for some reason, which kind of makes it pointless for SERP scraping.
If you want an example of it working then just change $proxy and $port to '
www.google.com', and '80', and pass it a list of request urls. It'll scrape all ten pages of a google query as fast as it'll scrape one page with my normal curl 'connect -> request -> close' style code.
I haven't experimented to see how many requests they'll let you pipeline at once, but it seems to be a lot, and it doesn't seem to trigger the captcha at all.