
![]() |
Indica
let me jump into my flame suit and come out saying i'm a
.netaddict![]() phpfor web work. i'm rebuilding my entireseosystem and would like to keep things in one language.that said, i'm considering making the jump and going 100% phpfor the new system. one thing holding me back is my idea ofphpperformance, can it compete with a multithreaded application? i'm trying to wrap my head around how to get the same performance inphpas i do with my.netapplications. i suppose you could execute thephpscript multiple times and achieve a multithreaded-like result? what do you do to gain the best performance?also things like zend optimizer, what's your take on it: worth using, noticeable benefits? i can already envision a sleak OO design for this thang, if i can get performance issues squared away. thanks in advance phpgurus![]() nutballs
ah multithreading. perk will have a lot to say about it. you have to think a bit outside your normal, client app, line of thinking. I use it ALOT now, so does perk. I will elaborate if i get back before perk answers, but right now I gotta run out and beg for work... lol
Indica
yeah i know perk and multithreading are the epitome of a loving relationship
![]() looking forward to your response perkiset
Performance issues...
OK, question: is it that you need lots of crunching, where you have C++ pounding the processor, or is it just that you need lots of processes running concurrently against others' websites and such? (Important note: I am/have been a strong C++ and object pascal programmer and have written many, many multithreaded apps and daemons. I am no stranger to this discussion from all sides...) The reason, is that there is a myth that multithreading really helps. The largest bottleneck in traditional setups (which I'd assume yours to be, particularly since you say " SEO"![]() network access and, in fact, the target site rather than your local processing.Consider my personal spider: it is two scripts, a dispatcher and a spiderlet. I make heavy use of MySQL as well. The only manual thing that I need to do is put the starting URL into MySQL. The dispatcher, which runs either as often as I want or manually, looks as sees that there is a job to do - grabs the URL from a table and then dispatches a spiderlet to grab it. Spiderlets have a one job only: grab the URL they are assigned and put any new links they find into the todo stack in the database (they do other stuff as well ![]() When the spiderlet is done, it dies. The dispatcher is configured to watch ps aux to see how many spiderlets are currently running, and keep (n) spiderlets running at any given time. In this way, I am using the OS for multiprocessing, instead of compiling something complicated as a threaded app. Note here that I define "speed" also as how long it takes me to develop and maintain my warez. This little spider rig took me about 4 hours to go from concept to production, and it is still in production today. The same methdology works for my email blaster as well as ... erm ... well, other stuff. It is true that head-to-head, a pure C++/# app will kick ass on a PHPapp. But the question is, at what cost? Now you have two languages, you are bound to a compiled model, you cannot deploy it easily on lots and lots of othermachines (note that my spider et al![]() Note that with proper MySQL usage and watching the process list, I have eliminated the need for semaphores and mutexes, file locking, you name it - my systems are now so simple that most folks in love with complexity would laugh at me as a rookie. ![]() ![]() Indica
you've hit many of the issues there perk
![]() quote author=perkiset link=topic=849.msg5908#msg5908 date=1206641388 OK, question: is it that you need lots of crunching, where you have C++ pounding the processor, or is it just that you need lots of processes running concurrently against others' websites and such? mostly the latter, for scraping and other similar activities. i think i may still use .net(via mono) for processor-heavy tasks like page generation, keyword generation, stuff like that.quote author=perkiset link=topic=849.msg5908#msg5908 date=1206641388 When the spiderlet is done, it dies. The dispatcher is configured to watch ps aux to see how many spiderlets are currently running, and keep (n) spiderlets running at any given time. In this way, I am using the OS for multiprocessing, instead of compiling something complicated as a threaded app. this is how i envisioned doing it (after seeing you describe this setup here). are you able to watch ps aux (whatever that is, i've got to brush up on *nix, i feel like a fish out of water in it :roflmao ![]() quote author=perkiset link=topic=849.msg5908#msg5908 date=1206641388 It is true that head-to-head, a pure C++/# app will kick ass on a PHPapp. But the question is, at what cost? Now you have two languages, you are bound to a compiled model, you cannot deploy it easily on lots and lots of othermachines (note that my spider et al![]() this is *the* reason why i'm chosing PHP- scalability, specifically for cheap hosting. that simply cannot be done with.netapplications, butphpno problem: just upload the script, have them interface with the base, and do the bidding.the aim is for the system to be extremely modular and portable, so portions of it can be distributed to a plethora of hosts and they will all work in unison to complete work. the PHP.. its winning me over![]() nutballs
Perk hit the problems so I wont cover those. pipe is the biggest.
How I multithread my phpstuff is to use perks webrequest class to hit a processing page that replies back with an expected string and the webrequest class terminates early. it could all be done without his class, just made it easy for me to deal with (inventing wheels and all). As a result I have a loop in my requester that hits the processor 10 times per run, triggering 10 separate instances of the process. I dont have to wait around for anything to finish, and if you think a little backwards, you can have the processor hit an update page when each instance is done, however long it takes, so you dont have to wait around, and yet still know, or trigger a next step in your process.Its just like having 10 people surfing your site. It doesnt wait for the first guy to be done until it replies to the next 9 guys. perkiset
quote author=Indica link=topic=849.msg5912#msg5912 date=1206643905 mostly the latter, for scraping and other similar activities. i think i may still use .net(via mono) for processor-heavy tasks like page generation, keyword generation, stuff like that.Oh man... if page generation is taking too long then you are doing it wrong. I don't have a site that is not 100% dynamic and just flat out kick ass. In fact, my PHPsites are, in some cases, over 10x the speed of my compiled apps because of the load that the whole framework has on it. With myPHPapps, they load and execute precisely, and only, what they absolutely must. But all that said, if you are still in a bind, then make page generation a distributed task like spidering and presto!quote author=Indica link=topic=849.msg5912#msg5912 date=1206643905 this is how i envisioned doing it (after seeing you describe this setup here). are you able to watch ps aux (whatever that is, i've got to brush up on *nix, i feel like a fish out of water in it :roflmao ![]() @ ps aux - this is a *nix command that tells you each of the processes that is currently running on your box. From php, you'd: $buff = shel_exec('ps aux') and $buff will contain the text that *nix popped out, then you'dregexit to extract what you want. So if you spawned 10 spiderlets, you'd have 10 lines describing the spiderlet process running@ fire & forget - This works fine on a dedicated box, but you may have problems on shared hosting. The reason, is that your hosts may well (probably are) be watching your processes and may bag you for taking too much time. Short little bursty processes will not get their eire up, nor will you pop up on their radar. Also, if you have a multi function box (web sites, eblaster etc etc) then you can easily throttle various processes for your current priorities. This is a biggie for me, as I often go from bored to maxxed out in a very short amount of time... so I can tell my spiders that they are only allowed 3 processes, my blaster can do no more than 10/minute and I've got almost my entire box at my disposal (the processor anyway). In general (opinion here) I do not like fire and forget anymore. I wrote a lot more of that sort of thing in the windoz environment... when I fully embraced a *nix way of thinking, it changed my programmatic style dramatically. I now find that I am drawn to coding tiny little processes that can start and stop on a dime, because the start and stop overhead does not exist like in a windows environment. Another issue: if you write in a very webbish PHPway, then you are not thinking about garbage collection and freeing your objects. So writing daemons with this attitude you may be more likely to write leaky apps - or you might make use of a lib that is leaky and you wouldn't even know it, because it was never designed to run for 8 straight hours. If you spin off processes (don't even think of them as apps anymore) and they do a little something and quit, your memory management is automatic and absolute.quote author=Indica link=topic=849.msg5912#msg5912 date=1206643905 this is *the* reason why i'm chosing PHP- scalability, specifically for cheap hosting. that simply cannot be done with.netapplications, butphpno problem: just upload the script, have them interface with the base, and do the bidding.the aim is for the system to be extremely modular and portable, so portions of it can be distributed to a plethora of hosts and they will all work in unison to complete work. exactly ... and there's another reason for me: I got tired of rebooting windows. Little teeny processes running on a *nix box are about as stable and boringly predictable it's not even funny. Coding like this wil make you the creator of appliances, rather than applications. Write apps for the iPhone... write appliance processes for those things that you want to fire and forget![]() perkiset
quote author=nutballs link=topic=849.msg5914#msg5914 date=1206645203 Its just like having 10 people surfing your site. It doesnt wait for the first guy to be done until it replies to the next 9 guys. Spot on NBs... Apachebased multiprocessing is just shithot. The biggest benefit here is thatApacheis, in fact, a multithreaded app so you can claim that you're threading![]() PHPto do it, so you can (A) have WAY more processes running concurrently with almost no overhead and (B) make use of APC which is where the real speed comes in. I use APC to huge effect, but obviously, only on my own boxes and not shared hosting deployments.Indica
hmm i see perk, good points about run time length i hadn't thought of that. could you be arsed to make a simple example of how you do your spiderlet/watcher setup? or is it around here already? i must confess i haven't read everything here, i've got like 50 pages of unread shit
![]() about page generation: my sites are generally dynamic also, that was more of an example than anything. i think i can get use to firing off scripts in an on-demand fashion, though this entire thing is a complete shift in coding philosophy and style - back onto the training wheels ![]() surprisingly i never have to reboot my windows boxes much. i just built a new xp pro server and it's been running for weeks while scraping away. i told myself i wasn't going to put *nix on it, but i suppose now i will have to. thank jesus for vmware, i should be able to vm xp pro and still use it as normal. excuse me while i go pirate ![]() ubuntuon my laptop, i suppose it's time i start using it full time so i can force myself tolearnthe way of the *nix. lord help me with that one, i have a hard enough time installing shit. and i've never been able to compile shit without 500 errors and *nix telling me to run back to windows![]() unrelated sidenote: i've been toying with extjs's web desktop @ http://extjs.com/deploy/dev/examples/desktop/desktop.html it will make one sexy interface for the system. seems to have a strong set of built in libraries which will make presentation quick and easy. perkiset
quote author=Indica link=topic=849.msg5918#msg5918 date=1206648333 hmm i see perk, good points about run time length i hadn't thought of that. could you be arsed to make a simple example of how you do your spiderlet/watcher setup? or is it around here already? i must confess i haven't read everything here, i've got like 50 pages of unread shit ![]() Check in the PHPCode repository for a scaled back version of my spider... I'll have a look in a bit if you can't find it.quote author=Indica link=topic=849.msg5918#msg5918 date=1206648333 i think i can get use to firing off scripts in an on-demand fashion, though this entire thing is a complete shift in coding philosophy and style - back onto the training wheels ![]() It is, you're right on the dot. But it is also hugely worth it. I made that journey quite a while ago and have never looked back. |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads