The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 09:37:01 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: HTTP Pipelining and proxy servers...  (Read 2788 times)
emonk
Rookie
**
Offline Offline

Posts: 44


View Profile
« on: November 08, 2007, 02:44:38 PM »

 This got ignored over at the syndk8, so I thought I'd ask for some help here. Seems like a smarter crowd in general. Advice, please?

 I need to be able to pipeline my HTTP GET requests. I need to be able to do this through an HTTP Proxy server. Making a single request at a time is SLOW. Pipelining speeds up my app like a thousand percent.

 I can't figure out how. I've tried using HTTP CONNECT then making the HTTP/1.1 requests. I've tried smuggling them in via an HTTP/1.0 POST. Nothing works.

 Does anyone know of a way to do this?

  Here is what I have so far, but it doesn't work. Sad

Code:
function fastget($domain, $pages) {
$proxy = "10.1.1.154";
$port = "81";

 $fp = fsockopen($proxy, $port, $errno, $errstr, 30);
 if (!$fp) {
     echo "$errstr ($errno)\n";
 } else {
    //Set up our header stuff...
    $header = "Host: $domain\r\nUser-Agent: " .  random_useragent() . "\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nProxy-Connection: keep-alive\r\n\r\n";
    //The first request...
    foreach ($pages as $page) {
    $out .= "GET $page HTTP/1.1\r\n$header";
    }   
    //Shut down our http connection...   
    $out .= "Connection: Close\r\n\r\n";

    $postheader = "CONNECT $domain:80 HTTP/1.1\r\n\r\n";

    $out = $postheader . $out;

    print $out;

    fwrite($fp, $out);
    while (!feof($fp)) {
        $output .= fgets($fp, 128);
    }
    fclose($fp);
}
return $output;
print $output;
}
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: November 10, 2007, 11:04:25 AM »

Have you ever seen pipelining in this way work? I tried my butt off for a while and got nowhere, because the HTTP spec is a strict request/response mechanism (I tried all sorts of clever hacks and such). Apache, for example, does not maintain a notion of an open request line so that you can continually re-request over the same pipe. Consider, for example, just the way that the headers portion of an HTTP request work: Once you send them and start sending content, you cannot send headers anymore. And there is no notion of "Page Tail" so that a new set of headers can be sent down the same pipe. This can be tested simply by telenetting - if you open a connection and attempt to get more than one page it will not work - unless there is magic here that I am unware of (a huge possibility, BTW - I am no expert here).

My personal thing was to pipeline ajax responses - I wanted to build a data flooder and I just couldn't get it to work. My eventual way was to use concurrent requests rather than trying to speed up serialized requests by leaving the pipe open.

Sorry, this was certainly not the answer you were looking for - but perhaps if you want to talk about why you need it to go so much faster we could work through some options. The truth is that the connection/handshake/header/content/close cycle is not all THAT burdened by the connection/handshake/close components, so if you need to go that much faster perhaps we should look at another solution.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
emonk
Rookie
**
Offline Offline

Posts: 44


View Profile
« Reply #2 on: November 10, 2007, 03:03:27 PM »

Have you ever seen pipelining in this way work? I tried my butt off for a while and got nowhere, because the HTTP spec is a strict request/response mechanism (I tried all sorts of clever hacks and such). Apache, for example, does not maintain a notion of an open request line so that you can continually re-request over the same pipe. Consider, for example, just the way that the headers portion of an HTTP request work: Once you send them and start sending content, you cannot send headers anymore. And there is no notion of "Page Tail" so that a new set of headers can be sent down the same pipe. This can be tested simply by telenetting - if you open a connection and attempt to get more than one page it will not work - unless there is magic here that I am unware of (a huge possibility, BTW - I am no expert here).

My personal thing was to pipeline ajax responses - I wanted to build a data flooder and I just couldn't get it to work. My eventual way was to use concurrent requests rather than trying to speed up serialized requests by leaving the pipe open.

Sorry, this was certainly not the answer you were looking for - but perhaps if you want to talk about why you need it to go so much faster we could work through some options. The truth is that the connection/handshake/header/content/close cycle is not all THAT burdened by the connection/handshake/close components, so if you need to go that much faster perhaps we should look at another solution.

/p

 Thanks for the well thought out response perks.

 I actually got this code to work, and it sped my google SERP scraper up by something like a thousand percent. Also it kept me from getting blocked as fast. I think the big G monitors per connection instead of per request.

 Pipelining only works with HTTP/1.1, not 1.0 so that was probably your problem when you tried it. Basically you just issue all your get requests in a row before the 'Connection: close'. I just can't figure out how to make it work through an HTTP proxy for some reason, which kind of makes it pointless for SERP scraping.

 If you want an example of it working then just change $proxy and $port to 'www.google.com', and '80', and pass it a list of request urls. It'll scrape all ten pages of a google query as fast as it'll scrape one page with my normal curl 'connect -> request -> close' style code.

 I haven't experimented to see how many requests they'll let you pipeline at once, but it seems to be a lot, and it doesn't seem to trigger the captcha at all.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: November 10, 2007, 03:21:14 PM »

Could be the 1.0 thang - I had almost all of my stuff back-versioned to make sure I was ultimately compatible - there was also some discussion that Ajax worked better with that rev rather than 1.1.

Pretty cool discovery if you're right about G and this action... and I'm gona have to try it again from a pure HTTP perspective rather than an AJAX one... my goals were to stream out telemetry really fast rather than pages.

Side note - I'm wondering if the server itself has something to do with the pipe - I never tried hitting Google, and Apache did not like my efforts at all. I wonder if I had something set or not set to make this happen... or what. So in that respect, I'm wondering if the proxies specifically don't want you doing that... dunno. In any case, we've clearly gone beyond my knowledge LOL...

Good luck, please post if you get more results worthy of note... this is good research.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!