Thanx
Yep it is a spider

In the case of spider stuff like a good DNS can make a world of difference.
I was getting a lot of timeouts, changed the DNS and worked a lot better.
Problem with spider is resource ussuage is spikey
It is scanning multiple hosts at once.
If host is good it will push the data back fast which means the html parser has to work extra hard.
But if host is slow ......
Just have to play with it.
The multiprocessing module is pretty cool
Theoretically you could have a manager on one machine, which is then connected by ssl tunnels to children on other machines.
So basically you could have your own cloud
