 |
vsloathe
I've decided to give forking a go. I know it will take some work, but after this is over I hope to be able to fork all night long. I need a method though as far as the actual execution goes, so here is the basic gist of what I want to do: Fork the process, grab a web page, give the captcha from the web page back out to the browser, and wait for the user to enter the answer. Once the user has entered the answer, I will continue with the child, submit the POST and then die. I was talking with TD earlier and he mentioned using semaphores or mutexes to accomplish this. I understand the concept, but I need someone to explain the details or perhaps provide examples of this to my dumb ass. Anyone? Thanks.
jammaster82
I would, but i dont have the 'tiene'?  am interested in learn ing
this as well.. I know very little about it at all but im guessing Have a page generate a unique id specific to this session and user and store it into a table like this: insert into semaphoretable (uniqueidfield, status) values ('000001','waiting'); then xfr to a page that just does a 1 second refresh select status from semaphoretable where uniqueid='000001' if its waiting, refresh to same page, if its finally updated as captcha_ok, by the captcha_check page and its subsidiary pages, then you can continue... just dont forget to clean up the semaphore table later.. The forked off process being the captcha thread and if succesful, does a : insert into semaphore table (uniqueidfield, status) values ('000001','captcha_ok'); probably the wrong way .. interested in what follows jm
perkiset
A quick wrapper: When you compile PHP with the SYSVSEM library (a switch at compile time) you have access to the kernel's semaphore and mutex handles. In case you are light here, a mutex is system-wide handle that is either owned or not at any given time. You request ownership and are granted it if no other process has it. If it is already owned, then you can hang and wait (you'll be placed in a queue for ownership) and when your function returns the handle you proceed. In this way, you know that only one process in the system at any one time is doing <something critical>. Mutexes are named, so there can be many mutexes in the system that all represent single-access critical components or code. Code using a mutex is most often called a critical section, and you enter the section requesting the mutex with a timeout of <nMS or nSecs> and if you don't get it you can either loop and wait again or fail out and handle things another way. A semaphore is similar - it is another named resource that is system-wide and requested via a function like a mutex. A sempahore can have up to <n> owners at any given time, defined by the original creator of the semaphore. So for example, if you want up to 5 processes to have access to something BUT NO MORE then you'd use a semaphore rather than a mutex. Sems and Mutexes do not, by themselves, really provide communication between processes - they provide process "manners." In the old world (main frame days), you'd acquire a mutex, modify a file putting some kind of string message in it, then let go of the mutex. In that way, you knew that no two processes would dick with the message file at the same time. I used sems and mutexes a lot in my DOS development days and then in about '91/2 with Borland C++ 3.1 - it was the cleanest way to organize some problematic programming things. I barely ever use them now, prefering a queueing methodology that is supported nicely by databases. I also prefer using the OS kernel rather than forking for apps like you are describing, because the speed is less critical - you're going to wait LOADS of time for the user (relative to processor time), why implement the complexity of threads when you'll have that kind of wait? IMO, better to use poor mans threading (shell_exec another process) with a database queue as your messaging mechanism on that kind of app. However, it is an excellent skill to have and interestingly, I may have the need to do something similar in the near future - I am currently analysing the strength of writing a TCP/IP based service written entirely in PHP which will require threading. Here is a complete example from the PHP code repository - I think it should spur you on a bit. Good luck, please post your results! http://www.perkiset.org/forum/ php /forked_ php _daemon_example-t474.0.html
vsloathe
Thanks as always Perk. I might give it a shot with curl_multi so I don't have to get my hands as dirty, but the only way I've used curl_multi to date is to spawn a bunch of curl children and then just wait until they are all done. I want to take the returns asynchronously, so that I can display captchas as they are gotten. E.g. from the user's perspective I want this to *blaze*. We shall see what I come up with.
perkiset
"Blaze" is relative - remember, the WAY WAY hugest bottleneck will be the user, waiting for the captcha and bandwidth in and out of your mac hine. You'd be better to focus on how to shrink those times than the trivially small bump (by comparison) you'll get by forking as opposed to execing (if you are interested in that technique). Again, forking is an outstanding tool to have in the box, but it is also can be an order of magnitude more complicated when it comes to memory management, understanding "who's doing what" and particularly debugging, which, in a multithreaded app is sometimes *monsterously* difficult - believe me on that one. Not to throw water on the notion man, just a quick analysis of bang-for-buck on doing this. If it's about learn ing
the technique then I am all for it. If it's a needed-soon mission critical app then I'd use another technology until I understood the dynamics of it better.
vsloathe
Yeah. As I said, I'll probably curl_multi it. There will be an initial delay as the threads are created and launched, but once the captchas start coming back to the user, he'll see nothing but rapid fire awesomeness, which is the goal.
nop_90
forks are not threads. a fork is a seperate process so therefore does not share memory etc. you need some sort of IPC if you want to do that, which big pain in ass.
DangerMouse
I wonder if an AJAX /COMET solution maybe appropriate for what your trying to achieve vsloathe ? It could be used to submit completed captchas, but also as a trigger to grab fresh ones. A simple approach could be 10 on a page, on tab out of the data entry point for number 5, request a new batch. This could possibly help with captcha time out issues aswel. A COMET approach could mean the server is constantly requesting images, passing them to the browser as soon as they are ready. Just a thought. DM
vsloathe
quote author=DangerMouse link=topic=714.msg4967#msg4967 date=1200649062 I wonder if an
AJAX /COMET solution maybe appropriate for what your trying to achieve vsloathe ? It could be used to submit completed captchas, but also as a trigger to grab fresh ones. A simple approach could be 10 on a page, on tab out of the data entry point for number 5, request a new batch. This could possibly help with captcha time out issues aswel.
A COMET approach could mean the server is constantly requesting images, passing them to the browser as soon as they are ready.
Just a thought.
DM
A good thought. It had occurred to me as well, but my AJAX kung fu is weak. Any pseudocode or examples to help me get started?
DangerMouse
quote author=vsloathe link=topic=714.msg4970#msg4970 date=1200667675 A good thought. It had occurred to me as well, but my
AJAX kung fu is weak.
Any pseudocode or examples to help me get started?
Afraid not, I've never even attempted AJAX let alone its inverse COMET, sure Perk and others will have plenty of better tips than me there. I kind of envisaged using javascript 's event model to catch an "on tab out" or "on sumit / click" style event to launch the xmlhttpRequest - I think there are equivilant mechanisms but I've no clue how they work - Perk makes a good case for using another technique over xmlhttp here: http://www.perkiset.org/forum/ajax /it%E2%80%99s_time_to_dump_xmlhttprequest-t336.0.html, although I suspect cross platform/domain issues might not be that important to you here (although it could be an awsome extension of the project :mob  . Anyways, the xmlhttpRequest would pass the solved captcha value the user entered to a PHP handling script; the response from this script, in whatever format, could then be parsed by the javascript and the browser display changed accordingly - appending any fresh available captchas to the bottom of the list in this case (probably a DOM insert child/sibling type command - again I'm not too sure). The power would be in the PHP handling script which could use some kind of persistance to track exactly whats going on, and launch requests to grab new captcha images, and send captcha responses passed up from the browser accordingly. Thinking about it, you could do some quite interesting things here, dynamically calculating how many unsolved captchas need to be kept in a buffer depending on solving rate for example. I know even less about how a COMET approach would work, I've yet to see many implementations of this technology outside of full blown web applications (google spreadsheets etc.) - but I think the gist is that the server polls the client, causing it to update the page when specified conditions are met, in this instance, when new captchas have been obtained. This could be useful to reduce the complexity of the PHP handler for the AJAX call described above, it would simply need to recieve completed captchas and forward them on the their required destination. Any fresh captchas would be posted to the browser using a seperate process. Whilst I think this is a neater solution, and definately cooler on the geek/techy scale, it may introduce more problems than its worth as I suspect there would need to be a way to choke the obtaining of new captchas where solving is slow so some form of cross script communication is still needed. Plus, iirc the server implementation is quite tricky due to system resource hogging. It would also be cool to use XSLT to render the page, the data could then all be send and recieved in XML with ease then - again I don't know how to do this really - just know the theory :  I'm sure others will weigh in here with more practical suggestions lol, but thats my rambling  worth  DM
jammaster82
Isnt AJAX /COMET/XMLHttpRequest all gonna use the same amount of bandwidth to pretty much do the same thing, only ajax /comet/xmlhttprequest has soooo many new technicalities with them but at the bottom of all these we are all stick stuck by the inte rnet and on a 36.6 dialup a simple 400 byte http Get header request takes about a second including the tcp/ip handling and it still takes about a second even if your t1 to t1, for all the handshaking and blah zay blah that goes inbetween the route - 'like when you do a tracert from ms dos..' right? the time you MIGHT save is a few milliseconds on the client side with AJAX and the page wont have to be refreshed but its still a tcp/ip transmission and getting that request out sooner wont matter timewise when you have to wait that second anyway....right? I would just lean with whats simple and meta refresh has been working for soooo many years now i would hate to have my site crash at 3 am and have to wake up with one eye open and analyze the shit for four hours till the client woke up only to tell him i found the error in some new untested technology i anakin skywalker episode two'ed my way right into and lost my freaking arm over it..... (nailed padme though, sweet!)  but that could also be, cause i am lazy to learn new things until someone else breaks them and fixes that bleeding edge...
DangerMouse
Yeah I don't think it will help in terms of how much bandwidth is used or how long the data will take to arrive at the browser, but I got the impression that the idea was to create a fast interface that bombarded a user with captchas to enter consistantly? I just mentioned the approach above as it avoids browser refreshes and allows background process to obtain captchas while others are being entered. It was totally theoretical though, just something I dreamed up, don't really have any strong evidence to prove it would work. DM
jammaster82
Hmm... HOw about loading like 5(or 'x') of them with the first page and handling it all client side, so that by the time they get to 3 you can ask for four more seeing as how four would take just as long as one, economizing the motion that takes the longest (the tcpip transmission however its done)? just interested in this discussion, maintaining my position of not knowing anything..
vsloathe
I decided to try out the x ajax framework, because I'd been kicking around the idea for a while and as I mentioned earlier, my JS/ AJAX skills are weak. Here's what I've got, with a possible request from my fellow geeks: I'm requesting a new captcha on each onkeyup event in the answer entry box, then storing the result in the HTML of the page. I like using the page's HTML as storage space, but I'm wondering if there's a better way to do it - currently it's not the best because the speed increase is only marginal, as it will go out on the last onkeyup and fetch a captcha. If the user hits enter immediately after the last onkeyup, he's not going to see much difference speed-wise. I was wondering if any of you geniuses could think up a way to make the onkeyup *only* fire for the first onkeyup event? <?
php
/** * @author vsloathe * @copyright 2008 */
require_once("xajax _core/xajax AIO.inc.php "); require_once('class.gmailCreator.php '); $xajax = new xajax (); $xajax ->registerFunction("prepCap"); $xajax ->registerFunction("getCap"); $xajax ->registerFunction("doPost"); $xajax ->processRequest(); $xajax ->printJavascript (); echo(' <html> <body onload="xajax _getCap();"> <form onsubmit="xajax _doPost(escape(postStr.value), capAnswer.value, escape(qPostStr.value), escape(capUrl.value)); return false;"> <div id="capImg"></div> <input type="text" name="capAnswer" onkeyup="xajax _prepCap()" /><br /> <input type="submit" value="Go" onclick="xajax _doPost(escape(postStr.value), capAnswer.value, escape(qPostStr.value), escape(capUrl.value));" /> <input type="hidden" name="postStr" /> <input type="hidden" name="capUrl" value="0" /> <input type="hidden" name="qPostStr" value="0" /> </form> </body> </html> '); function prepCap() { $objResponse = new xajax Response(); $numThreads = 1; $GC = new gmailCreator; $GC->numThreads = $numThreads; $GC->getAccountPage(); $capurls = $GC->buildPostStr(); $postStr = $GC->postStrings[0]; $objResponse->assign("capUrl","value",$capurls[0]); $objResponse->assign("qPostStr","value", $postStr); return $objResponse; } function getCap() { $objResponse = new xajax Response(); $numThreads = 1; $GC = new gmailCreator; $GC->numThreads = $numThreads; $GC->getAccountPage(); $capurls = $GC->buildPostStr(); $postStr = $GC->postStrings[0]; $objResponse->assign("capImg","innerHTML", '<img src="'.$capurls[0].'"></img>'); $objResponse->assign("postStr","value", $postStr); return $objResponse; } function doPost($postStr,$capAnswer,$qPostStr,$capUrl) { $objResponse = new xajax Response(); if($qPostStr) { $objResponse->assign("capImg","innerHTML", '<img src="'.urldecode($capUrl).'"></img>'); $objResponse->assign("capAnswer","value",''); $objResponse->assign("postStr","value", $qpostStr); } else { $numThreads = 1; $GC = new gmailCreator; $GC->numThreads = $numThreads; $GC->getAccountPage(); $capurls = $GC->buildPostStr(); $postStr = $GC->postStrings[0]; $objResponse->assign("capImg","innerHTML", '<img src="'.$capurls[0].'"></img>'); $objResponse->assign("capAnswer","value",''); $objResponse->assign("postStr","value", $postStr); } $postStr.='&newaccountcaptcha='.$capAnswer; $ph = popen('php dopost.php "'.$postStr.'"','r'); return $objResponse; } ?>
Nothing too special in there as far as my intellectual property goes, so play with it all you want, but of course the real meat is in that class file I require_once in the beginning.
thedarkness
quote author=vsloathe link=topic=714.msg4977#msg4977 date=1200679471 I like using the page's HTML as storage space, but I'm wondering if there's a better way to do it - currently it's not the best because the speed increase is only marginal, as it will go out on the last onkeyup and fetch a captcha.
Use another async AJAX call to store the captcha image somewhere else (database, flat file) then grab it back when you need it? Cheers, td
perkiset
quote author=jammaster82 link=topic=714.msg4972#msg4972 date=1200673472 Isnt
AJAX /COMET/XMLHttpRequest all gonna use the same amount of bandwidth to pretty much do the same thing
Absolutely not. The packet you throw with an XMLHTTPRequest or an XRPC is tiny. It is a fraction of the overhead of a full page pull, unless your pages are simply "hello world." A pull for a whole page is quite "heavy" by comparison. VS - both my Ajax Requestor class and XRPC are in the Javascript repository - they are a lot lighter and should be really easy to understand. Writing to the server side is also almost trivial. With what you are doing, I think an Ajax MO is a good plan and will work quite nicely.
nop_90
too lazy to read entire thread so i just browse. you do not set image using ajax etc (or not directly) On server side. avoid threads no need for them. use multicurl have multi fetch the captchas and store them inside a queue. also add an id token on client side. have submit button (or what ever even u want) attached to js event. upon pushing button, send imputed captcha to server, with captcha token id. when ajax replies, it will send new token id of new captcha. so if id=1233121231 then set image value to image?value=1233121231 that way it will automatically grab image from server and refresh. (or u can do other schemes, only thing is that image url has to be unique)
nop_90
as trivial exercise to user. should be able to use same server code (as in you do not change a line of code). but if you use proper rpc you can make client in what ever language/platform you want  so u could make client lets say in wx python so it look very nice, but most of grunt work done on server  If you use right rpc, with correct libraries should be able to be done entire client (with gui) in 50-100 lines of code.
|