The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 20, 2019, 11:38:56 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: optimal php performance  (Read 3883 times)
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« on: March 27, 2008, 08:42:43 AM »

let me jump into my flame suit and come out saying i'm a .net addict using it for most of my 'beefy' systems and applications, while using php for web work. i'm rebuilding my entire seo system and would like to keep things in one language.

that said, i'm considering making the jump and going 100% php for the new system. one thing holding me back is my idea of php performance, can it compete with a multithreaded application? i'm trying to wrap my head around how to get the same performance in php as i do with my .net applications. i suppose you could execute the php script multiple times and achieve a multithreaded-like result? what do you do to gain the best performance?

also things like zend optimizer, what's your take on it: worth using, noticeable benefits?

i can already envision a sleak OO design for this thang, if i can get performance issues squared away. thanks in advance php gurus  Praise
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #1 on: March 27, 2008, 09:07:00 AM »

ah multithreading. perk will have a lot to say about it. you have to think a bit outside your normal, client app, line of thinking. I use it ALOT now, so does perk. I will elaborate if i get back before perk answers, but right now I gotta run out and beg for work... lol
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #2 on: March 27, 2008, 09:34:18 AM »

yeah i know perk and multithreading are the epitome of a loving relationship  ROFLMAO

looking forward to your response
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: March 27, 2008, 11:09:48 AM »

Performance issues...

OK, question: is it that you need lots of crunching, where you have C++ pounding the processor, or is it just that you need lots of processes running concurrently against others' websites and such?

(Important note: I am/have been a strong C++ and object pascal programmer and have written many, many multithreaded apps and daemons. I am no stranger to this discussion from all sides...)

The reason, is that there is a myth that multithreading really helps. The largest bottleneck in traditional setups (which I'd assume yours to be, particularly since you say "SEO") is network access and, in fact, the target site rather than your local processing.

Consider my personal spider: it is two scripts, a dispatcher and a spiderlet. I make heavy use of MySQL as well. The only manual thing that I need to do is put the starting URL into MySQL. The dispatcher, which runs either as often as I want or manually, looks as sees that there is a job to do - grabs the URL from a table and then dispatches a spiderlet to grab it. Spiderlets have a one job only: grab the URL they are assigned and put any new links they find into the todo stack in the database (they do other stuff as well Mobster, but essentially work a single target URL).

When the spiderlet is done, it dies. The dispatcher is configured to watch ps aux to see how many spiderlets are currently running, and keep (n) spiderlets running at any given time. In this way, I am using the OS for multiprocessing, instead of compiling something complicated as a threaded app.

Note here that I define "speed" also as how long it takes me to develop and maintain my warez. This little spider rig took me about 4 hours to go from concept to production, and it is still in production today. The same methdology works for my email blaster as well as ... erm ... well, other stuff.

It is true that head-to-head, a pure C++/# app will kick ass on a PHP app. But the question is, at what cost? Now you have two languages, you are bound to a compiled model, you cannot deploy it easily on lots and lots of other machines (note that my spider et al  Mobster can run on cheap cheap hosts, effectively making my scalability limitless...) ... the complexity and maintenance factor, combined with the increased difficulty in scaling make compiled frameworks a non-starter for me anymore.

Note that with proper MySQL usage and watching the process list, I have eliminated the need for semaphores and mutexes, file locking, you name it - my systems are now so simple that most folks in love with complexity would laugh at me as a rookie.  ROFLMAO Fat lot they know... Wink
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #4 on: March 27, 2008, 11:51:45 AM »

you've hit many of the issues there perk Smiley

OK, question: is it that you need lots of crunching, where you have C++ pounding the processor, or is it just that you need lots of processes running concurrently against others' websites and such?

mostly the latter, for scraping and other similar activities. i think i may still use .net (via mono) for processor-heavy tasks like page generation, keyword generation, stuff like that.

When the spiderlet is done, it dies. The dispatcher is configured to watch ps aux to see how many spiderlets are currently running, and keep (n) spiderlets running at any given time. In this way, I am using the OS for multiprocessing, instead of compiling something complicated as a threaded app.

this is how i envisioned doing it (after seeing you describe this setup here). are you able to watch ps aux (whatever that is, i've got to brush up on *nix, i feel like a fish out of water in it  ROFLMAO) from shared hosts and such? if not, i think i can code my scripts in such a way that they run until there is no more work to be done. so using your spider example, each script would continue to run until the queue of urls is empty, thus eliminating the need for a watcher. this way i'd launch n amount of scripts and they'll run until works done. what do you think about this verses your method? what would the benefits and cons of such a method be that you can think of, in comparison to yours?

It is true that head-to-head, a pure C++/# app will kick ass on a PHP app. But the question is, at what cost? Now you have two languages, you are bound to a compiled model, you cannot deploy it easily on lots and lots of other machines (note that my spider et al  Mobster can run on cheap cheap hosts, effectively making my scalability limitless...) ... the complexity and maintenance factor, combined with the increased difficulty in scaling make compiled frameworks a non-starter for me anymore.

this is *the* reason why i'm chosing PHP - scalability, specifically for cheap hosting. that simply cannot be done with .net applications, but php no problem: just upload the script, have them interface with the base, and do the bidding.

the aim is for the system to be extremely modular and portable, so portions of it can be distributed to a plethora of hosts and they will all work in unison to complete work.

the PHP.. its winning me over  Shocked
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #5 on: March 27, 2008, 12:13:23 PM »

Perk hit the problems so I wont cover those. pipe is the biggest.

How I multithread my php stuff is to use perks webrequest class to hit a processing page that replies back with an expected string and the webrequest class terminates early. it could all be done without his class, just made it easy for me to deal with (inventing wheels and all). As a result I have a loop in my requester that hits the processor 10 times per run, triggering 10 separate instances of the process. I dont have to wait around for anything to finish, and if you think a little backwards, you can have the processor hit an update page when each instance is done, however long it takes, so you dont have to wait around, and yet still know, or trigger a next step in your process.

Its just like having 10 people surfing your site. It doesnt wait for the first guy to be done until it replies to the next 9 guys.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: March 27, 2008, 12:25:50 PM »

mostly the latter, for scraping and other similar activities. i think i may still use .net (via mono) for processor-heavy tasks like page generation, keyword generation, stuff like that.
Oh man... if page generation is taking too long then you are doing it wrong. I don't have a site that is not 100% dynamic and just flat out kick ass. In fact, my PHP sites are, in some cases, over 10x the speed of my compiled apps because of the load that the whole framework has on it. With my PHP apps, they load and execute precisely, and only, what they absolutely must. But all that said, if you are still in a bind, then make page generation a distributed task like spidering and presto!

this is how i envisioned doing it (after seeing you describe this setup here). are you able to watch ps aux (whatever that is, i've got to brush up on *nix, i feel like a fish out of water in it  ROFLMAO) from shared hosts and such? if not, i think i can code my scripts in such a way that they run until there is no more work to be done. so using your spider example, each script would continue to run until the queue of urls is empty, thus eliminating the need for a watcher. this way i'd launch n amount of scripts and they'll run until works done. what do you think about this verses your method? what would the benefits and cons of such a method be that you can think of, in comparison to yours?
@ ps aux - this is a *nix command that tells you each of the processes that is currently running on your box. From php, you'd: $buff = shel_exec('ps aux') and $buff will contain the text that *nix popped out, then you'd regex it to extract what you want. So if you spawned 10 spiderlets, you'd have 10 lines describing the spiderlet process running
@ fire & forget - This works fine on a dedicated box, but you may have problems on shared hosting. The reason, is that your hosts may well (probably are) be watching your processes and may bag you for taking too much time. Short little bursty processes will not get their eire up, nor will you pop up on their radar. Also, if you have a multi function box (web sites, eblaster etc etc) then you can easily throttle various processes for your current priorities. This is a biggie for me, as I often go from bored to maxxed out in a very short amount of time... so I can tell my spiders that they are only allowed 3 processes, my blaster can do no more than 10/minute and I've got almost my entire box at my disposal (the processor anyway).

In general (opinion here) I do not like fire and forget anymore. I wrote a lot more of that sort of thing in the windoz environment... when I fully embraced a *nix way of thinking, it changed my programmatic style dramatically. I now find that I am drawn to coding tiny little processes that can start and stop on a dime, because the start and stop overhead does not exist like in a windows environment.

Another issue: if you write in a very webbish PHP way, then you are not thinking about garbage collection and freeing your objects. So writing daemons with this attitude you may be more likely to write leaky apps - or you might make use of a lib that is leaky and you wouldn't even know it, because it was never designed to run for 8 straight hours. If you spin off processes (don't even think of them as apps anymore) and they do a little something and quit, your memory management is automatic and absolute.

this is *the* reason why i'm chosing PHP - scalability, specifically for cheap hosting. that simply cannot be done with .net applications, but php no problem: just upload the script, have them interface with the base, and do the bidding.

the aim is for the system to be extremely modular and portable, so portions of it can be distributed to a plethora of hosts and they will all work in unison to complete work.
exactly ... and there's another reason for me: I got tired of rebooting windows. Little teeny processes running on a *nix box are about as stable and boringly predictable it's not even funny. Coding like this wil make you the creator of appliances, rather than applications. Write apps for the iPhone... write appliance processes for those things that you want to fire and forget Wink
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: March 27, 2008, 12:29:48 PM »

Its just like having 10 people surfing your site. It doesnt wait for the first guy to be done until it replies to the next 9 guys.

Spot on NBs... Apache based multiprocessing is just shithot. The biggest benefit here is that Apache is, in fact, a multithreaded app so you can claim that you're threading  ROFLMAO but more importantly, you are using a single instance of PHP to do it, so you can (A) have WAY more processes running concurrently with almost no overhead and (B) make use of APC which is where the real speed comes in. I use APC to huge effect, but obviously, only on my own boxes and not shared hosting deployments.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #8 on: March 27, 2008, 01:05:33 PM »

hmm i see perk, good points about run time length i hadn't thought of that. could you be arsed to make a simple example of how you do your spiderlet/watcher setup? or is it around here already? i must confess i haven't read everything here, i've got like 50 pages of unread shit  D'oh!

about page generation: my sites are generally dynamic also, that was more of an example than anything.

i think i can get use to firing off scripts in an on-demand fashion, though this entire thing is a complete shift in coding philosophy and style - back onto the training wheels Grin

surprisingly i never have to reboot my windows boxes much. i just built a new xp pro server and it's been running for weeks while scraping away. i told myself i wasn't going to put *nix on it, but i suppose now i will have to. thank jesus for vmware, i should be able to vm xp pro and still use it as normal. excuse me while i go pirate  ROFLMAO i've got ubuntu on my laptop, i suppose it's time i start using it full time so i can force myself to learn the way of the *nix. lord help me with that one, i have a hard enough time installing shit. and i've never been able to compile shit without 500 errors and *nix telling me to run back to windows  ROFLMAO

unrelated sidenote: i've been toying with extjs's web desktop @ http://extjs.com/deploy/dev/examples/desktop/desktop.html it will make one sexy interface for the system. seems to have a strong set of built in libraries which will make presentation quick and easy.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: March 27, 2008, 01:32:12 PM »

hmm i see perk, good points about run time length i hadn't thought of that. could you be arsed to make a simple example of how you do your spiderlet/watcher setup? or is it around here already? i must confess i haven't read everything here, i've got like 50 pages of unread shit  D'oh!
Check in the PHP Code repository for a scaled back version of my spider... I'll have a look in a bit if you can't find it.

i think i can get use to firing off scripts in an on-demand fashion, though this entire thing is a complete shift in coding philosophy and style - back onto the training wheels Grin
It is, you're right on the dot. But it is also hugely worth it. I made that journey quite a while ago and have never looked back.

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!