The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 05:20:09 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: How many processes are too many ?  (Read 1772 times)
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« on: April 19, 2009, 05:17:47 PM »

Python 2.6 rocks because of the new http://docs.python.org/library/multiprocessing.html
Basically it runs subprocess and then communicated using a pipe.
Cool part is that it proxy the object.
It also is better then threads since if a subprocess crash it will not bring down the entire system
Also solves problem that most languages have with the global interpretor lock.
Anyway in simple terms it is a fancy fork.

Anyway I have a dual server.
I have a spider.
How many subprocess is too many ?
Obviously I want to have max subprocess, but without each subprocess being starved for CPU time
Is there a scientific way to calculate this ?



Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: April 19, 2009, 05:25:15 PM »

I think this is fundamentally an incorrect way to ask the question.

If you have things to do that require full-time processing (ie., no sleep states) then the best you can possibly do with 2 processes is 50% of the total possible 100% speed, less overhead for switching, meaning that 2 threads will never be as fast as a single thread. Moving forward, 4 processes will do less than 25% etc ad infinitum.

If, on the other hand, you have whole lots of sleepy time for the processes (like your spider) then Ehrlang math comes into play and it's more of a chaotic resource usage map than a simple equation. We used this sort of math to calculate how many telephone agents we needed to answer calls in our call centers - the parameters were, how long are our callers willing to stay on hold, how many calls per minute, what was the average call length, the total possible agents, and the amount we were willing to let them sit idle. This is more the math that you want to look at.

There is really no hard limit to your question, but more how many can run until the random possibility that all of them (or a critical number) must answer a request/handle a process at the same time, and how willing are you to let the processor slow down for that period. You may be able to get away with many hundred running simultaneously if your between-dispatch time is high and variable. The only way that I know to test this is to run a bunch and watch machine usage. Quite simply, if I get (on average) above about 60% continual utilization, then I know that a spike will kill me, so I peak out right about there - however YMMV.

You certainly know all of this already, but I thought I'd outline the thinking behind it for n00b readers that want to know the same.
« Last Edit: April 19, 2009, 05:26:49 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #2 on: April 19, 2009, 08:52:41 PM »

Thanx
Yep it is a spider Cheesy
In the case of spider stuff like a good DNS can make a world of difference.
I was getting a lot of timeouts, changed the DNS and worked a lot better.

Problem with spider is resource ussuage is spikey
It is scanning multiple hosts at once.
If host is good it will push the data back fast which means the html parser has to work extra hard.
But if host is slow ......

Just have to play with it.

The multiprocessing module is pretty cool
Theoretically you could have a manager on one machine, which is then connected by ssl tunnels to children on other machines.
So basically you could have your own cloud Smiley

Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #3 on: April 19, 2009, 09:03:07 PM »

Not sure if this is impressive or not.
Server is a cheap dual pentium mmx.
Right now I am scanning approx 200-300 pages a minute.
It is being fed into a webbrowser's html parser and excuting the JS on the page
I am running at like 25%-30% CPU usuage.

Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: April 19, 2009, 09:16:43 PM »

Actually that sounds just about right. You could probably double the load without every maxxing out the box, although there will be peaks that slow it down a bit. The more processes running concurrently, the more space (believe it or not) that there is to run ... but the converse is that the odds that the potential for a spike are higher. I think you're right in the pocket.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!