
![]() |
arms
in my linkspaming app im using a thread pool to handle many simultaneous connections. it's pretty fast, almost maximizing my high speed dsl line. but it hogs almost all cpu and memory it can get (partly because its java i guess) and basically needs it's own dedicated
machine. it tried basically the same thing inpythonand although it used much less memory and cpu, i could never get the same speeds.i don't care much about the cpu and memory but i'm always trying to push the speed so i played around with non-blocking sockets in java and python. java has the nio package and it's really not pretty to implement an http client with. i can get amazing speeds though, a bit better than the multi threaded version but im missing something and some connections are kinda going into some kind of limbo. inpythoni tried the asyncore library and found it almost as ugly to use as the java one and lesser performance. i also tried the twisted.web package which was very easy to use but again the speed was not acceptable.im wondering what experience people have had with either method. am i chasing my tail with non blocking io or what? what are you guys doing for your linkspam/spidering/scraping whatever. perkiset
I use
PHPand poor man's threading ie., exec('./perksSpider.php') for example. Of course I pay a little extra for the additional shell space, but since the OS is handling my threading and my processes are all asleep while they wait for HTTP responses, it is very low impact on the box. I can run lots of spiders without any degredation on mymachine (exception of RAM usage of course).I wager that NOP will argue for a true threaded solution, and in a perfect world I'd agree with him... but in my world anyway, things change so damn often that having little applets managed by a dispatching process works really well for me.Note that in answer to your question I use neither of your listed questions - I use PHPblocking sockets and no threading because the OS handles it for me./p arms
i was just looking through your spider source - your
phpis clean man!all oo and everything. i only really use phpfor some simple cloaking or logging but i manage to make a mess of it in short work.nop_90
non-blocking IO will knock the stuffing out of threads.
the reason is because everytime you switch threads the CPU has to store the context, and for the new thread restore its context. I use the multi interface in libcurl. http://curl.haxx.se/libcurl/c/libcurl-multi.html Basically the multi interface just uses the select socket function. for each socket i make a state machine.when the socket is "ready" it moves allong its state machine.then when it need to do more request add back to the multi. that way it easy to keep track of what is happening. Also if you have it this way since each socket in a state machine, it appears to be single threaded.In the past i work with twisted pythonhttp://twistedmatrix.com/trac/It is single thread event driven. Very powerful, very quick, downside it uses a complex system of callbacks. Also with single thread you do not have to worry about shared memory and shit like that which can cause horrid bugs/lockups. Using the multi interface on libcurl my son as part of his science project on a social networks in 4 hours could send over 60K messages on a crappy vps.this is every 200 messages logging into a new account, getting the message interface for each message, and sending I think he used 20 sockets. complexity using a state machine for each socket actually makes things easier.once you have it setup simple to use. i think i see on netsome stuff about statemachine for java.(I know fuk all about java ![]() arms
quote non-blocking IO will knock the stuffing out of threads. thats all i wanted to hear. now i have a reason to to some coding. i'll continue on my non blocking quest and maybe rewrite my code from scratch. i just did a search on state machine and was reminded how little i know.m0nkeymafia
Nop dude I agree
Firstly state machines are the only way to do proper socketprogrammingAnd blocking rocks We originally used non blocking sockets but could only manage about 1mb/s out of it After some tweaks and a move to blocking sockets we maxed out the adapter at 100mb/s Blocking rules ![]() nop_90
good state
machine bookhttp://www.quantum-leaps.com/writings/book.htm or i thought it good book ![]() C++ oriented but same principles apply in other language. without state machine u end up with a if/else sack of shit which is impossible to debug.also when it comes to modifying a lot easier. perkiset
Totally agree that state
machines rock, and when building something truly complex they are the only way to go.But this is a link spammer! FFS - I think that spending time on the mechanical structure of an efficient and elegant multi-threaded or single threaded multi-state mechanism is contrary to the real goal: cash quick, made against a very nimble adversary. I say forget that level of complexity and write a little PHP(or whatever you choose) script that does *one* from a database of targets then fire off a bunch of them. Certainly not as elegant - but what is elegance, if not a fatter wallet?/p nop_90
You have seen how my state
machine looks like.I think i post it in scheme resources. But you can do similar things in other languages. Also i can plug state machines inside other statemachines.So i might have a SM for logging in, one for sending a PM etc. From arms post here, and from the posts on other boards s/he is going after a social network.For a social network you need reliability.Ability to easily change your program, debug it etc. Because my SM toolbox already setup, when it comes time to attack a new target i examine target. I then just make the states and fill in the blanks. If there are network errors etc, my SM handle automatically.Example i outline the states -> get login page -> get login page response -> get message page -> get message page response -> send message -> send message response Then just plug in the pieces. Also each state should be "independant" it should not be accessing globals etc. anyway that the way i see it. arms
perkiset is completely right. it's just a link spammer. and it already works well. but if i can make it work better then i can't resist.
nop_90 the mad scientist just gave me some ideas and some shit to learn.i'm going to make this a side project, something to unwind with after a day of boring monkey work. perkiset
The best part about you post (Nop) is the gear spinning... the notion of a state
machine class that is yours and ready to go for any application like that is a handy tool to have available to you. As you already have this set up, it is completely uncomplicated for you - that's hot.m0nkeymafia
paris hilton === state
machine socketslol ![]() "thats hot" perkiset
quote author=m0nkeymafia link=topic=233.msg1575#msg1575 date=1179512252 paris hilton === state machine sockets![]() Machine Socket!![]() nop_90
My SM evolved
![]() I got fed/too lazy writing the same crappy checking code etc over and over again. So i thought why not use a SM. It worked pretty good so i fixed it up and started using it in other project. Pretty soon u end up with all sorts of tools ![]() m0nkeymafia
I unfortunately an am extremely lazy
phpprogrammerI never make reusable code ![]() perkiset
How can you claim to be truly lazy if you rewrite your code every single time?!?!
Seems to me your hard working but a doofus ![]() m0nkeymafia
lol perk, i just never invest the time required to make a nice reusable bit of code
so maybe 50% lazy 50% plonker ![]() nop_90
a 100% plonker
![]() My son in his science fair project noticed that many of the sites he frequents have hidden form inputs to prevent XSS. When you post the form you have to find these hidden imputs. So my son make nice little function that finds these hidden input for a form. A very simple regexactually.It returns them all in a nice little hash time spent probably same as to parse 2 forms, can be used over and over again. These things do not need to be fancy but they save much time. m0nkeymafia
LOL nice one nop
![]() esrun
But have you considered strongbow super
|

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads