arms

in my linkspaming app im using a thread pool to handle many simultaneous connections. it's pretty fast, almost maximizing my high speed dsl line. but it hogs almost all cpu and memory it can get (partly because its java i guess) and basically needs it's own dedicated

mac

 hine. it tried basically the same thing in

python

  and although it used much less memory and cpu, i could never get the same speeds.

i don't care much about the cpu and memory but i'm always trying to push the speed so i played around with non-blocking sockets in java and

python

 . java has the nio package and it's really not pretty to implement an http client with. i can get amazing speeds though, a bit better than the multi threaded version but im missing something and some connections are kinda going into some kind of limbo. in

python

  i tried the asyncore library and found it almost as ugly to use as the java one and lesser performance. i also tried the twisted.web package which was very easy to use but again the speed was not acceptable.

im wondering what experience people have had with either method. am i chasing my tail with non blocking io or what? what are you guys doing for your linkspam/spidering/scraping whatever.

perkiset

I use

PHP

  and poor man's threading ie., exec('./perksSpider.

php

 ') for example. Of course I pay a little extra for the additional shell space, but since the OS is handling my threading and my processes are all asleep while they wait for HTTP responses, it is very low impact on the box. I can run lots of spiders without any degredation on my

mac

 hine (exception of RAM usage of course).

I wager that NOP will argue for a true threaded solution, and in a perfect world I'd agree with him... but in my world anyway, things change so damn often that having little

apple

 ts managed by a dispatching process works really well for me.

Note that in answer to your question I use neither of your listed questions - I use

PHP

  blocking sockets and no threading because the OS handles it for me.

/p

arms

i was just looking through your spider source - your

php

  is clean man!
all oo and everything. i only really use

php

  for some simple cloaking or logging but i manage to make a mess of it in short work.

nop_90

non-blocking IO will knock the stuffing out of threads.
the reason is because everytime you switch threads the CPU has to store the context, and for the new thread restore its context.

I use the multi interface in libcurl.
http://curl.haxx.se/libcurl/c/libcurl-multi.html
Basically the multi interface just uses the select socket function.

for each socket i make a state

mac

 hine.
when the socket is "ready" it moves allong its state

mac

 hine.
then when it need to do more request add back to the multi.
that way it easy to keep track of what is happening.
Also if you have it this way since each socket in a state

mac

 hine, it ap

pear

 s to be single threaded.

In the past i work with twisted

python

  http://twistedmatrix.com/trac/
It is single thread event driven.
Very powerful, very quick, downside it uses a complex system of callbacks.
Also with single thread you do not have to worry about shared memory and shit like that which can cause horrid bugs/lockups.

Using the multi interface on libcurl my son as part of his science project on a social

net

 works in 4 hours could send over 60K messages on a crappy vps.
this is every 200 messages logging into a new account, getting the message interface for each message, and sending
I think he used 20 sockets.

complexity using a state

mac

 hine for each socket actually makes things easier.
once you have it setup simple to use.
i think i see on

net

  some stuff about state

mac

 hine for java.
(I know fuk all about java Applause, when i used java non-blocking sockets had not been implemented yet)

arms

quote
non-blocking IO will knock the stuffing out of threads.

thats all i wanted to hear. now i have a reason to to some coding. i'll continue on my non blocking quest and maybe rewrite my code from scratch.

i just did a search on state

mac

 hine and was reminded how little i know.

m0nkeymafia

Nop dude I agree
Firstly state

mac

 hines are the only way to do proper socket

programming

 
And blocking rocks

We originally used non blocking sockets but could only manage about 1mb/s out of it
After some tweaks and a move to blocking sockets we maxed out the adapter at 100mb/s

Blocking rules Applause

nop_90

good state

mac

 hine book
http://www.quantum-leaps.com/writings/book.htm
or i thought it good book Applause
C++ oriented but same principles apply in other language.

without state

mac

 hine u end up with a if/else sack of shit which is impossible to debug.
also when it comes to modifying a lot easier.

perkiset

Totally agree that state

mac

 hines rock, and when building something truly complex they are the only way to go.

But this is a link spammer!

FFS - I think that spending time on the mechanical structure of an efficient and elegant multi-threaded or single threaded multi-state mechanism is contrary to the real goal: cash quick, made against a very nimble adversary.

I say forget that level of complexity and write a little

PHP

  (or whatever you choose) script that does *one* from a database of targets then fire off a bunch of them. Certainly not as elegant - but what is elegance, if not a fatter wallet?

/p

nop_90

You have seen how my state

mac

 hine looks like.
I think i post it in scheme resources.
But you can do similar things in other languages.
Also i can plug state

mac

 hines inside other state

mac

 hines.
So i might have a SM for logging in, one for sending a PM etc.

From arms post here, and from the posts on other boards s/he is going after a social

net

 work.
For a social

net

 work you need reliability.
Ability to easily change your program, debug it etc.

Because my SM toolbox already setup, when it comes time to attack a new target i examine target.
I then just make the states and fill in the blanks. If there are

net

 work errors etc, my SM handle automatically.

Example i outline the states
-> get login page

-> get login page response

-> get message page

-> get message page response

-> send message

-> send message response
Then just plug in the pieces.

Also each state should be "independant" it should not be accessing globals etc.
anyway that the way i see it.


arms

perkiset is completely right. it's just a link spammer. and it already works well. but if i can make it work better then i can't resist.
nop_90 the mad scientist just gave me some ideas and some shit to

learn

 .
i'm going to make this a side project, something to unwind with after a day of boring monkey work.

perkiset

The best part about you post (Nop) is the gear spinning... the notion of a state

mac

 hine class that is yours and ready to go for any application like that is a handy tool to have available to you. As you already have this set up, it is completely uncomplicated for you - that's hot.

m0nkeymafia

paris hilton === state

mac

 hine sockets

lol Applause
"thats hot"

perkiset

quote author=m0nkeymafia link=topic=233.msg1575#msg1575 date=1179512252

paris hilton === state

mac

 hine sockets


Applause WELL done ... that one's deep enough that it'll have me giggling for the day. a State

Mac

 hine Socket!  Applause

nop_90

My SM evolved Applause
I got fed/too lazy writing the same crappy checking code etc over and over again.
So i thought why not use a SM.
It worked pretty good so i fixed it up and started using it in other project.

Pretty soon u end up with all sorts of tools Applause

m0nkeymafia

I unfortunately an am extremely lazy

php

  programmer
I never make reusable code Applause

perkiset

How can you claim to be truly lazy if you rewrite your code every single time?!?!
Seems to me your hard working but a doofus  Applause

m0nkeymafia

lol perk, i just never invest the time required to make a nice reusable bit of code
so maybe 50% lazy 50% plonker Applause

nop_90

a 100% plonker Applause
My son in his science fair project noticed that many of the sites he frequents have hidden form inputs to prevent XSS.
When you post the form you have to find these hidden imputs.

So my son make nice little function that finds these hidden input for a form.
A very simple

regex

  actually.
It returns them all in a nice little hash

time spent probably same as to parse 2 forms,
can be used over and over again.
These things do not need to be fancy but they save much time.

m0nkeymafia

LOL nice one nop Applause

esrun

But have you considered strongbow super


Perkiset's Place Home   Politics @ Perkiset's