perkiset

I've recently become involved in a project where I want to fire off multiple tasks to work concurrently like forking or threading from a web request. Some of the issues surrounding my project:


  • If I use exec() to fire off my processes I incur the cost of a new shell and

    PHP

      instance for every thread

  • I want access to the APC cache while in the "thread" - which is not available to shell-executed scripts

  • I want to take advantage of

    Apache

     's natural threading by calling multiple pages

  • I cannot wait around for

    Apache

      to finish each "thread" before firing off another, so I need a way to fire multiple requests in rapid succession, then return a final result when ALL the threads are complete.

  • I need scalability, and if I locally fire the requests (ala exec()) then I bind the capabilities of <this master process> to resources on <this

    mac

     hine>.


First and foremost, I use MySQL for the inter-process communication. My personal style is to have each "thread" had a row in a table and update it when it is complete. Details of this are beyond the scope of my article however.

The problem with firing off these requests is that normally, webRequest2 (and virtually every other web pull class) will hang around until the "page" that

Apache

  is serving is completed. I needed a way to short circuit that so that I can fire and forget.

The solution involves a couple items.

First, I modified the webRequest2 class (available in the

PHP

  repository - the latest version of the code is here: http://www.perkiset.org/forum/

php

 /perks_new_webrequest_class-t616.0.html;msg5330#msg5330) so that I can supply an "early termination string. Essentially, as content comes back from the server I'll look at the entire content body and see if the string ap

pear

 s in it - if so, I shut down the socket and call it a day.

Second, I need to write my "thread script" to make use of this handy feature.

For demonstration purposes, my "master script" will call just one instance of the "thread script" and echo that it is done. My "thread script" will sleep for 10 seconds, then update a little file to let me know that it is complete.

The master script looks like this:

<?

php

 

// Requestor

require('/www/sites/lib/classes/class.webrequest2.

php

 ');

$req = new webRequest2();
$req->earlyTermStr = 'all_done';
$buff = $req->simpleGet('http://mydomain/testresponse.

php

 ');

echo "Received: $buff ";

?>


Obviously you need to change mydomain to ... well, your domain. I'll assume you save this file as "testpull.

php

 ."

The testresponse.

php

  script looks like this:

<?

php

 

// Answering routine

echo "all_done";
ob_flush();
flush();

sleep(10);
file_put_contents('/www/sites/mydir/output.txt', 'Process complete');

?>


touch a little file called output.txt and give it 777 permissions so that this will work, obviously in a directory you have access to.

If you call for testpull.

php

  from a browser, you'll see "all_done" almost immediately. Watching the directory where you have output.txt, 10 seconds later you'll see the file updated.

Using this tiny method, you could have a loop that fired off as many processes as you needed (based on your

Apache

  capabilities of course) from a single web request. If you then busy-wait for a signal ie., perhaps a database row is updated or something, then you'd know when all of the processes were done.

The last item in my requirements above is scalability. Why this is most interesting to me, is if I have a tiny database of

mac

 hine IPs that can do this little request, then I could pull all the available

mac

 hines from a database, randomize which one I'll hit and then send the requests to that

mac

 hine... more precisely, I could round-robin through ALL of the available processors and send my "threads" to each one - effectively distributing my workload to any number of potential back-end processors.

(As soon as I get into this notion, there will be whole bunches of questions on how I load balance this, which I do - I have great little scripts to help me handle my back-end process load balancing, but that is also beyond the scope of this article).

/p

DangerMouse

Interesting stuff Perk, nice post Applause

Would something like this help even further do you think? http://en.wikipedia.org/wiki/HTTP_pipelining and http://syndk8

.net

 /forum/index.

php

 /topic,13602.0.html


DM

perkiset

I am thinking through how pipelining would be implemented in my webRequest2 class... just haven't put it together yet because I haven't had a need. Have you looked at the class and considered doing a mod and sharing it? That'd be vurrah cool...

thanks BTW,
/p

DangerMouse

I based a curl webrequest class I built largely on inspiration from your original, so I'm sure modifying it shouldnt be too tricky - more a case of working out suitable error checking and use cases.

Not really sure which parts of the header get repeated when pipelining nor do I have experience of working at socket level but I'll certainly look into it Applause

DM

perkiset

Update: Using this methodology for the aforementioned projects and it *sings*.
I am extremely pleased with its stability and speed - and since it does not involve *real* multithreading but rather the same sort of "paging" that

apache

  normally employs, logging is easy and clean.

vsloathe

Bah.

Wish I'd been around last week to tell you I've been working on something similar the past 2 weeks and I could've been sharing what I've done.

perkiset

Great minds and all... glad to hear that it's worked out for you as well.

I'm finding that I love how stable it is under load. This, of course, makes perfect sense because it's

Apache

  and all... but it still is delightful to see.


Perkiset's Place Home   Politics @ Perkiset's