The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 10:00:10 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Using PHP, the webRequest2 class and Apache for multitasking  (Read 3663 times)
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« on: February 13, 2008, 02:38:01 PM »

I've recently become involved in a project where I want to fire off multiple tasks to work concurrently like forking or threading from a web request. Some of the issues surrounding my project:

  • If I use exec() to fire off my processes I incur the cost of a new shell and PHP instance for every thread
  • I want access to the APC cache while in the "thread" - which is not available to shell-executed scripts
  • I want to take advantage of Apache's natural threading by calling multiple pages
  • I cannot wait around for Apache to finish each "thread" before firing off another, so I need a way to fire multiple requests in rapid succession, then return a final result when ALL the threads are complete.
  • I need scalability, and if I locally fire the requests (ala exec()) then I bind the capabilities of <this master process> to resources on <this machine>.
First and foremost, I use MySQL for the inter-process communication. My personal style is to have each "thread" had a row in a table and update it when it is complete. Details of this are beyond the scope of my article however.

The problem with firing off these requests is that normally, webRequest2 (and virtually every other web pull class) will hang around until the "page" that Apache is serving is completed. I needed a way to short circuit that so that I can fire and forget.

The solution involves a couple items.

First, I modified the webRequest2 class (available in the PHP repository - the latest version of the code is here: http://www.perkiset.org/forum/php/perks_new_webrequest_class-t616.0.html;msg5330#msg5330) so that I can supply an "early termination string. Essentially, as content comes back from the server I'll look at the entire content body and see if the string appears in it - if so, I shut down the socket and call it a day.

Second, I need to write my "thread script" to make use of this handy feature.

For demonstration purposes, my "master script" will call just one instance of the "thread script" and echo that it is done. My "thread script" will sleep for 10 seconds, then update a little file to let me know that it is complete.

The master script looks like this:
Code:
<?php

// Requestor

require('/www/sites/lib/classes/class.webrequest2.php');

$req = new webRequest2();
$req->earlyTermStr 'all_done';
$buff $req->simpleGet('http://mydomain/testresponse.php');

echo 
"Received: $buff\n\n";

?>


Obviously you need to change mydomain to ... well, your domain. I'll assume you save this file as "testpull.php."

The testresponse.php script looks like this:
Code:
<?php

// Answering routine

echo "all_done";
ob_flush();
flush();

sleep(10);
file_put_contents('/www/sites/mydir/output.txt''Process complete');

?>


touch a little file called output.txt and give it 777 permissions so that this will work, obviously in a directory you have access to.

If you call for testpull.php from a browser, you'll see "all_done" almost immediately. Watching the directory where you have output.txt, 10 seconds later you'll see the file updated.

Using this tiny method, you could have a loop that fired off as many processes as you needed (based on your Apache capabilities of course) from a single web request. If you then busy-wait for a signal ie., perhaps a database row is updated or something, then you'd know when all of the processes were done.

The last item in my requirements above is scalability. Why this is most interesting to me, is if I have a tiny database of machine IPs that can do this little request, then I could pull all the available machines from a database, randomize which one I'll hit and then send the requests to that machine... more precisely, I could round-robin through ALL of the available processors and send my "threads" to each one - effectively distributing my workload to any number of potential back-end processors.

(As soon as I get into this notion, there will be whole bunches of questions on how I load balance this, which I do - I have great little scripts to help me handle my back-end process load balancing, but that is also beyond the scope of this article).

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #1 on: February 13, 2008, 05:49:48 PM »

Interesting stuff Perk, nice post Smiley

Would something like this help even further do you think? http://en.wikipedia.org/wiki/HTTP_pipelining and http://syndk8.net/forum/index.php/topic,13602.0.html

DM
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: February 13, 2008, 06:06:44 PM »

I am thinking through how pipelining would be implemented in my webRequest2 class... just haven't put it together yet because I haven't had a need. Have you looked at the class and considered doing a mod and sharing it? That'd be vurrah cool...

thanks BTW,
/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #3 on: February 14, 2008, 03:55:42 PM »

I based a curl webrequest class I built largely on inspiration from your original, so I'm sure modifying it shouldnt be too tricky - more a case of working out suitable error checking and use cases.

Not really sure which parts of the header get repeated when pipelining nor do I have experience of working at socket level but I'll certainly look into it Smiley

DM
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: February 14, 2008, 06:48:04 PM »

Update: Using this methodology for the aforementioned projects and it *sings*.
I am extremely pleased with its stability and speed - and since it does not involve *real* multithreading but rather the same sort of "paging" that apache normally employs, logging is easy and clean.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #5 on: February 19, 2008, 07:45:03 AM »

Bah.

Wish I'd been around last week to tell you I've been working on something similar the past 2 weeks and I could've been sharing what I've done.
Logged

hai
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: February 19, 2008, 10:13:20 AM »

Great minds and all... glad to hear that it's worked out for you as well.

I'm finding that I love how stable it is under load. This, of course, makes perfect sense because it's Apache and all... but it still is delightful to see.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!