The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 11:37:34 PM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: OK Time to pcntl_fork! - Questions.  (Read 8182 times)
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« on: January 17, 2008, 08:39:26 AM »

I've decided to give forking a go. I know it will take some work, but after this is over I hope to be able to fork all night long.

I need a method though as far as the actual execution goes, so here is the basic gist of what I want to do:

Fork the process, grab a web page, give the captcha from the web page back out to the browser, and wait for the user to enter the answer. Once the user has entered the answer, I will continue with the child, submit the POST and then die. I was talking with TD earlier and he mentioned using semaphores or mutexes to accomplish this. I understand the concept, but I need someone to explain the details or perhaps provide examples of this to my dumb ass.

Anyone?

Thanks.
Logged

hai
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #1 on: January 17, 2008, 08:51:20 AM »

I would, but i dont have the 'tiene'?

 ROFLMAO  am interested in learning this as well..

I know very little about it at all but im
guessing
Have a page generate a unique id specific to this
session and user and store it into a table like this:

insert into semaphoretable (uniqueidfield, status) values ('000001','waiting');

then xfr to a page that just does a 1 second refresh

select status from semaphoretable where uniqueid='000001'

if its waiting, refresh to same page, if its finally updated
as captcha_ok, by the captcha_check page and its
subsidiary pages, then you can continue...

just dont forget to clean up the semaphore table later..

The forked off process being the captcha thread
and if succesful, does a :

insert into semaphore table (uniqueidfield, status) values ('000001','captcha_ok');

probably the wrong way .. interested in what follows
jm
« Last Edit: January 17, 2008, 09:09:04 AM by jammaster82 » Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: January 17, 2008, 09:34:25 AM »

A quick wrapper:

When you compile PHP with the SYSVSEM library (a switch at compile time) you have access to the kernel's semaphore and mutex handles.

In case you are light here, a mutex is system-wide handle that is either owned or not at any given time. You request ownership and are granted it if no other process has it. If it is already owned, then you can hang and wait (you'll be placed in a queue for ownership) and when your function returns the handle you proceed. In this way, you know that only one process in the system at any one time is doing <something critical>. Mutexes are named, so there can be many mutexes in the system that all represent single-access critical components or code. Code using a mutex is most often called a critical section, and you enter the section requesting the mutex with a timeout of <nMS or nSecs> and if you don't get it you can either loop and wait again or fail out and handle things another way.

A semaphore is similar - it is another named resource that is system-wide and requested via a function like a mutex. A sempahore can have up to <n> owners at any given time, defined by the original creator of the semaphore. So for example, if you want up to 5 processes to have access to something BUT NO MORE then you'd use a semaphore rather than a mutex.

Sems and Mutexes do not, by themselves, really provide communication between processes - they provide process "manners." In the old world (main frame days), you'd acquire a mutex, modify a file putting some kind of string message in it, then let go of the mutex. In that way, you knew that no two processes would dick with the message file at the same time.

I used sems and mutexes a lot in my DOS development days and then in about '91/2 with Borland C++ 3.1 - it was the cleanest way to organize some problematic programming things. I barely ever use them now, prefering a queueing methodology that is supported nicely by databases. I also prefer using the OS kernel rather than forking for apps like you are describing, because the speed is less critical - you're going to wait LOADS of time for the user (relative to processor time), why implement the complexity of threads when you'll have that kind of wait? IMO, better to use poor mans threading (shell_exec another process) with a database queue as your messaging mechanism on that kind of app.

However, it is an excellent skill to have and interestingly, I may have the need to do something similar in the near future - I am currently analysing the strength of writing a TCP/IP based service written entirely in PHP which will require threading. Here is a complete example from the PHP code repository - I think it should spur you on a bit.

Good luck, please post your results!

http://www.perkiset.org/forum/php/forked_php_daemon_example-t474.0.html
« Last Edit: January 17, 2008, 09:36:58 AM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #3 on: January 17, 2008, 09:56:51 AM »

Thanks as always Perk.

I might give it a shot with curl_multi so I don't have to get my hands as dirty, but the only way I've used curl_multi to date is to spawn a bunch of curl children and then just wait until they are all done. I want to take the returns asynchronously, so that I can display captchas as they are gotten. E.g. from the user's perspective I want this to *blaze*.

We shall see what I come up with.
Logged

hai
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: January 17, 2008, 10:18:08 AM »

"Blaze" is relative - remember, the WAY WAY hugest bottleneck will be the user, waiting for the captcha and bandwidth in and out of your machine. You'd be better to focus on how to shrink those times than the trivially small bump (by comparison) you'll get by forking as opposed to execing (if you are interested in that technique). Again, forking is an outstanding tool to have in the box, but it is also can be an order of magnitude more complicated when it comes to memory management, understanding "who's doing what" and particularly debugging, which, in a multithreaded app is sometimes *monsterously* difficult - believe me on that one.

Not to throw water on the notion man, just a quick analysis of bang-for-buck on doing this. If it's about learning the technique then I am all for it. If it's a needed-soon mission critical app then I'd use another technology until I understood the dynamics of it better.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #5 on: January 17, 2008, 10:20:30 AM »

Yeah. As I said, I'll probably curl_multi it. There will be an initial delay as the threads are created and launched, but once the captchas start coming back to the user, he'll see nothing but rapid fire awesomeness, which is the goal.
Logged

hai
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #6 on: January 17, 2008, 05:43:08 PM »

forks are not threads.
a fork is a seperate process so therefore does not share memory etc.
you need some sort of IPC if you want to do that, which big pain in ass.

Logged
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #7 on: January 18, 2008, 02:37:42 AM »

I wonder if an AJAX/COMET solution maybe appropriate for what your trying to achieve vsloathe ? It could be used to submit completed captchas, but also as a trigger to grab fresh ones. A simple approach could be 10 on a page, on tab out of the data entry point for number 5, request a new batch. This could possibly help with captcha time out issues aswel.

A COMET approach could mean the server is constantly requesting images, passing them to the browser as soon as they are ready.

Just a thought.

DM
Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #8 on: January 18, 2008, 07:47:55 AM »

I wonder if an AJAX/COMET solution maybe appropriate for what your trying to achieve vsloathe ? It could be used to submit completed captchas, but also as a trigger to grab fresh ones. A simple approach could be 10 on a page, on tab out of the data entry point for number 5, request a new batch. This could possibly help with captcha time out issues aswel.

A COMET approach could mean the server is constantly requesting images, passing them to the browser as soon as they are ready.

Just a thought.

DM

A good thought. It had occurred to me as well, but my AJAX kung fu is weak.

Any pseudocode or examples to help me get started?
Logged

hai
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #9 on: January 18, 2008, 09:03:59 AM »

A good thought. It had occurred to me as well, but my AJAX kung fu is weak.

Any pseudocode or examples to help me get started?

Afraid not, I've never even attempted AJAX let alone its inverse COMET, sure Perk and others will have plenty of better tips than me there.

I kind of envisaged using javascript's event model to catch an "on tab out" or "on sumit / click" style event to launch the xmlhttpRequest - I think there are equivilant mechanisms but I've no clue how they work - Perk makes a good case for using another technique over xmlhttp here: http://www.perkiset.org/forum/ajax/it%E2%80%99s_time_to_dump_xmlhttprequest-t336.0.html, although I suspect cross platform/domain issues might not be that important to you here (although it could be an awsome extension of the project  Mobster).

Anyways, the xmlhttpRequest would pass the solved captcha value the user entered to a PHP handling script; the response from this script, in whatever format, could then be parsed by the javascript and the browser display changed accordingly - appending any fresh available captchas to the bottom of the list in this case (probably a DOM insert child/sibling type command - again I'm not too sure).

The power would be in the PHP handling script which could use some kind of persistance to track exactly whats going on, and launch requests to grab new captcha images, and send captcha responses passed up from the browser accordingly. Thinking about it, you could do some quite interesting things here, dynamically calculating how many unsolved captchas need to be kept in a buffer depending on solving rate for example.

I know even less about how a COMET approach would work, I've yet to see many implementations of this technology outside of full blown web applications (google spreadsheets etc.) - but I think the gist is that the server polls the client, causing it to update the page when specified conditions are met, in this instance, when new captchas have been obtained. This could be useful to reduce the complexity of the PHP handler for the AJAX call described above, it would simply need to recieve completed captchas and forward them on the their required destination. Any fresh captchas would be posted to the browser using a seperate process. Whilst I think this is a neater solution, and definately cooler on the geek/techy scale, it may introduce more problems than its worth as I suspect there would need to be a way to choke the obtaining of new captchas where solving is slow so some form of cross script communication is still needed. Plus, iirc the server implementation is quite tricky due to system resource hogging.

It would also be cool to use XSLT to render the page, the data could then all be send and recieved in XML with ease then - again I don't know how to do this really - just know the theory  Roll Eyes

I'm sure others will weigh in here with more practical suggestions lol, but thats my rambling  worth Smiley

DM
Logged
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #10 on: January 18, 2008, 09:24:32 AM »

Isnt AJAX/COMET/XMLHttpRequest all gonna use
the same amount of bandwidth to pretty much
do the same thing, only ajax/comet/xmlhttprequest
has soooo many new technicalities with them but
at the bottom of all these we are all stick stuck by
the internet and on a 36.6 dialup a simple 400 byte
http Get header request  takes about a second including
the tcp/ip handling and it still takes about a second even if
your t1 to t1, for all the handshaking and blah zay blah that
goes inbetween the route - 'like when you do a tracert from ms
dos..' right?  the time you MIGHT save is a few milliseconds on
the client side with AJAX and the page wont have to be refreshed
but its still a tcp/ip transmission and getting that request out
sooner wont matter timewise when you have to wait that second anyway....right?

I would just lean with whats simple and meta refresh has been working
for soooo many years now i would hate to have my site crash at 3 am
and have to wake up with one eye open and analyze the shit for four hours
till the client woke up only to tell him i found the error in some new
untested technology i anakin skywalker episode two'ed my way right into and lost
my freaking arm over it.....  (nailed padme though, sweet!)

 ROFLMAO

but that could also be, cause i am lazy to learn new things until someone else
breaks them and fixes that bleeding edge... 
« Last Edit: January 18, 2008, 09:42:19 AM by jammaster82 » Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #11 on: January 18, 2008, 09:42:11 AM »

Yeah I don't think it will help in terms of how much bandwidth is used or how long the data will take to arrive at the browser, but I got the impression that the idea was to create a fast interface that bombarded a user with captchas to enter consistantly?

I just mentioned the approach above as it avoids browser refreshes and allows background process to obtain captchas while others are being entered. It was totally theoretical though, just something I dreamed up, don't really have any strong evidence to prove it would work.

DM
Logged
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #12 on: January 18, 2008, 09:48:09 AM »

Hmm... HOw about loading like 5(or  'x')  of them with the first page
and handling it all client side, so that by the time they get to 3 you
can ask for four more seeing as how four would take just as long as one,
economizing the motion that takes the longest (the tcpip transmission
however its done)?

just interested in this discussion, maintaining my position of not knowing anything..
Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #13 on: January 18, 2008, 11:04:31 AM »

I decided to try out the xajax framework, because I'd been kicking around the idea for a while and as I mentioned earlier, my JS/AJAX skills are weak. Here's what I've got, with a possible request from my fellow geeks: I'm requesting a new captcha on each onkeyup event in the answer entry box, then storing the result in the HTML of the page. I like using the page's HTML as storage space, but I'm wondering if there's a better way to do it - currently it's not the best because the speed increase is only marginal, as it will go out on the last onkeyup and fetch a captcha. If the user hits enter immediately after the last onkeyup, he's not going to see much difference speed-wise. I was wondering if any of you geniuses could think up a way to make the onkeyup *only* fire for the first onkeyup event?

Code:
<?php

/**
 * @author vsloathe
 * @copyright 2008
 */

require_once("xajax_core/xajaxAIO.inc.php");
require_once(
'class.gmailCreator.php');
$xajax = new xajax();
$xajax->registerFunction("prepCap");
$xajax->registerFunction("getCap");
$xajax->registerFunction("doPost");
$xajax->processRequest();
$xajax->printJavascript();
echo(
'
<html>
<body onload="xajax_getCap();">
<form onsubmit="xajax_doPost(escape(postStr.value), capAnswer.value, escape(qPostStr.value), escape(capUrl.value)); return false;">
<div id="capImg"></div>
<input type="text" name="capAnswer" onkeyup="xajax_prepCap()" /><br />
<input type="submit" value="Go" onclick="xajax_doPost(escape(postStr.value), capAnswer.value, escape(qPostStr.value), escape(capUrl.value));" />
<input type="hidden" name="postStr" />
<input type="hidden" name="capUrl" value="0" />
<input type="hidden" name="qPostStr" value="0" />
</form>
</body>
</html>
'
);
function 
prepCap()
{
$objResponse = new xajaxResponse();
$numThreads 1;
$GC = new gmailCreator;
    
$GC->numThreads $numThreads;
    
$GC->getAccountPage();
$capurls $GC->buildPostStr();
$postStr $GC->postStrings[0];
    
$objResponse->assign("capUrl","value",$capurls[0]);
    
$objResponse->assign("qPostStr","value"$postStr);
return $objResponse;
}
function 
getCap()
{
$objResponse = new xajaxResponse();
$numThreads 1;
$GC = new gmailCreator;
    
$GC->numThreads $numThreads;
    
$GC->getAccountPage();
$capurls $GC->buildPostStr();
$postStr $GC->postStrings[0];
    
$objResponse->assign("capImg","innerHTML"'<img src="'.$capurls[0].'"></img>');
    
$objResponse->assign("postStr","value"$postStr);
return $objResponse;
}
function 
doPost($postStr,$capAnswer,$qPostStr,$capUrl)
{
$objResponse = new xajaxResponse();
if($qPostStr)
{
$objResponse->assign("capImg","innerHTML"'<img src="'.urldecode($capUrl).'"></img>');
    
$objResponse->assign("capAnswer","value",'');
$objResponse->assign("postStr","value"$qpostStr);
}
else
{
$numThreads 1;
$GC = new gmailCreator;
    
$GC->numThreads $numThreads;
    
$GC->getAccountPage();
$capurls $GC->buildPostStr();
$postStr $GC->postStrings[0];
    
$objResponse->assign("capImg","innerHTML"'<img src="'.$capurls[0].'"></img>');
    
$objResponse->assign("capAnswer","value",'');
$objResponse->assign("postStr","value"$postStr);
}
$postStr.='&newaccountcaptcha='.$capAnswer;
$ph popen('php dopost.php "'.$postStr.'"','r');
return $objResponse;
}
?>


Nothing too special in there as far as my intellectual property goes, so play with it all you want, but of course the real meat is in that class file I require_once in the beginning.
Logged

hai
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #14 on: January 18, 2008, 09:10:01 PM »

I like using the page's HTML as storage space, but I'm wondering if there's a better way to do it - currently it's not the best because the speed increase is only marginal, as it will go out on the last onkeyup and fetch a captcha.

Use another async AJAX call to store the captcha image somewhere else (database, flat file) then grab it back when you need it?

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!