The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 12:45:16 PM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: Let's talk open proxies..  (Read 6667 times)
crypt
Rookie
**
Offline Offline

Posts: 12


View Profile
« on: June 05, 2007, 06:48:51 PM »

I've been considering using open proxies with certain parts of my code for a long time. However, I run into lots of timeouts and refused connections when testing my proxies in firefox/charon. I'm guessing the refused connections are caused by too many people using that particular proxy simultaneously, because they can be alleviated with a bit of hammering. Timeouts just happen sometimes, even with good proxies, and the same can be said for refused connections. Bleh pain in the ass...

I guess I'm just over thinking it and need to start off by coding a proxy tester in php, so I can atleast have an up-to-date database of proxies. Anonymity doesn't matter, but country of origin does matter, so I would use a geoIP database with it. I guess what I really need is moral support  Violin just a few suggestions as to how you would do it or if you HAVE done it, in whatever language would help me out alot...

For example:
Where would you obtain proxies? (I currently scrape mine from proxy sites, but I do know that you can scan for them, but haven't ever had success with that)

How would you test in PHP/Curl? (I would assume setting a timeout - not sure which option, but there are a couple timeout options - and the proxy with http tunnel and follow redirect, then hitting my target site and checking for a footprint - we'll say google.com for scraping purposes - even though it's not  Devilish)

How do you handle timeouts/failed connections encountered once you have "good" proxies? (I assume setting a maximum number of retries, and just hammering it - failing that, mark the proxy as bad in the database)


That's all I can think of right now, I'm sure I'll think of more after I've gotten to the point where I can actually USE the proxies I've scraped. Please hold my hand !  Need Help

Thanks
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: June 05, 2007, 07:17:36 PM »

Where would you obtain proxies? (I currently scrape mine from proxy sites, but I do know that you can scan for them, but haven't ever had success with that)
There was a list posted @ syndk8 a bit ago... perhaps a good starting place but I didn't test any of the addresses listed.

How would you test in PHP/Curl? (I would assume setting a timeout - not sure which option, but there are a couple timeout options - and the proxy with http tunnel and follow redirect, then hitting my target site and checking for a footprint - we'll say google.com for scraping purposes - even though it's not  Devilish)
If you're just looking for timing, perhaps you should proxy right back to the machine that is testing so that the timestamp would be as accurate as possible - also, the URL you query could run a little PHP script letting your <waiting process> know that that the request as come through. A simple GET param would let the script know which process to alert. Or it could be a semaphore file, or semV, or APC cache item, or record in the DB or or or... perhaps you hit it a few dozen times at various times to see what the average turn around on requests is and post that, so you have an idea what to expect when you go outside your own network.

Quote
How do you handle timeouts/failed connections encountered once you have "good" proxies? (I assume setting a maximum number of retries, and just hammering it - failing that, mark the proxy as bad in the database)
I'd score a proxy record in my database as +(an amount) for every good hit and build a running average of response times so that I could preference proxies in the future - obviously it'd get dinged for a fail. I would not list it as "bad" until it had failed on more than one occasion - or perhaps failed several times - because, as you note, even the best will have problems and you should not just toss something without giving it a solid chance. Additionally, I might list a proxy as "suspended" for a couple weeks or something, so that as every other spammer anonymous user discovers the proxy is overloaded and leaves, and the load falls away, you might be able to make use of it again. I might even have a spreading timeframe for when proxies are to be rechecked based on failure ie., if they fail for the first time, they'll only be un-preferenced a little... but if several times I might schedule the next attempt on <that record> for a week from now - and if it still failes then, perhaps a month from then.., perhaps after 6 months of repeated failure it'd for reals get the boot (provided I know that the proxy is still alive but I just can't get in).

I dunno...
/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
crypt
Rookie
**
Offline Offline

Posts: 12


View Profile
« Reply #2 on: June 05, 2007, 09:23:34 PM »

Pretty much along the same lines as I'm thinking, and I do like the +/- scoring idea, I use that for a few other things. First thing's first, however, I'm gonna check that list at syndk8 

Thanks so far, not done here yet!
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: June 05, 2007, 09:29:41 PM »

Here's a small ist of starters:
http://www.syndk8.net/forum/index.php/topic,9089.0.html

KMBA (MangoPirate) posted this and I think it might merit some of your time:
http://www.syndk8.net/forum/index.php/topic,9516.0.html

That might get you going...

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: June 06, 2007, 06:10:30 AM »

Hey crypt,

I did what perk mentioned with the -+ scoring.

I scrape proxy4free, (all those places have the same proxies anyways, some of the free proxy
sites are owned by same person LOL; just marketing), twice a day with a cron so the output
gets emailed to me, (altho I haven't received it in a few days, wtf?).

I put them in a perl DBM and have my script GET google with each proxy.
I check the page Title for 'google':

if($content =~ /<title>Google<\/title>/i) {

If true, that proxy get a +1, if false it gets -1.

If any proxy reaches -5 total, it is removed from the DB.

This is not a fast way to test tho, I let the sucker run overnight Cheesy

After 8 hours, the list is quite a bit smaller, but many of the proxies will have
a nice high scorce like 20-50, while most have a score of less than 10.

Anyways, I think of it as a reliability test.  It shows me which proxies I have my
best odds with.

Of course, I am adding "new" proxies everyday.

Bomps

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
crypt
Rookie
**
Offline Offline

Posts: 12


View Profile
« Reply #5 on: June 06, 2007, 08:35:56 PM »

Right on guys. That's we're all more or less on the same page, I'll get to work on this and let you know if I come up with any cool trix to add to the system.  Smooch
Logged

No links in signatures please
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #6 on: June 14, 2007, 04:48:42 AM »

I'm interested in this crypt. I've been meaning to check out www.cspy.org, it appears to have some good info. but I'm not at that point in a couple of "master plans" yet. I would have a tendency to use something that gives you access to the timeout on the socket so you have greater control in your testing process. Something along the lines of perk's or Bomps' weighting system for preferred proxies is a cool idea.

Like I said I'm interested, post your progress here crypt and maybe when i'm at that stage I might be able to dive in and give a hand.

twice a day with a cron so the output
gets emailed to me, (altho I haven't received it in a few days, wtf?).

lmfao here Bomps  ROFLMAO

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #7 on: June 14, 2007, 04:52:28 AM »

By a strange quirk of fate this thread is active at the syndk8 now;

http://www.syndk8.net/forum/index.php/topic,11579.0.html

Anyone got any Russian contacts?

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #8 on: June 14, 2007, 04:15:08 PM »

TD, I understand you wanting the timeout on the socket, or whatever, but what I found
is that any sort of close tolerence testing of open proxies is useless.  I mean, one minute
a proxy works fine, next minute it's 500 or 400, try again and it works fine.  That's why I
found that a general overall reliability test is the best.  Just to find the proxies that I have
my best chances with at any given time.


Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #9 on: June 14, 2007, 04:53:02 PM »

Understood Bomps, thanks for the heads up.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
crypt
Rookie
**
Offline Offline

Posts: 12


View Profile
« Reply #10 on: June 14, 2007, 07:30:48 PM »

Ok i've pretty much finished the system, i scrape proxy4free and samair.ru, they both seem to give a good number of working proxies. We'll say the system is in beta where it will probably stay indefinitely. What i do is scrape the sites, then I have a small checking script which I run as a background process like this:
Code:
exec('script.php arg1 arg2 > /dev/null 2>&1 &');
I pass it the number of tries (10), the url to check against, etc... I add 1 if it checks ok or subtract 1 from the score if it fails with a maximum of 10 and minimum of 0. I run 50 of these processes from cron every minute in order to keep my checked list updated. Ok now I've got an updated list, next I need my curl calls to have a retry, so I have a function that will retry every connection until I get 200 with a maximum number of attempts and a timeout (usually 10). If the maximum number of attempts is reached, then the next proxy in the list is tried. And it's pretty much that simple, the hard part was changing all my existing code to work like this Wink

I also used a geoIP database to sort my proxies by country. That's pretty simple to setup as well, and is very useful...
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #11 on: June 14, 2007, 08:27:56 PM »

Well done - sounds well thought out, and clearly makes best use of the OS to do the threading for you...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #12 on: June 14, 2007, 10:40:36 PM »

Very cool crypt.

BTW, there is a fork function for php just so everyone is aware.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #13 on: June 15, 2007, 07:09:53 AM »

BTW, there is a fork function for php just so everyone is aware.

Tis... however, caveat programmetor: you must only call it when executing PHP from a shell and never from an HTTP call

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
JasonD
Expert
****
Offline Offline

Posts: 100


View Profile
« Reply #14 on: July 13, 2007, 02:00:58 PM »

Just a heads up to an old old old couple of apps I love and use extensively.


Yet Another Proxy Hunter - http://yaph.sourceforge.net/

and it's sister application which I think is one of the most beautiful things I have ever come across

ProxyChains - http://proxychains.sourceforge.net/

Write your code like normal and make sure that all your *ahem - dubious* traffic goes is on a seperate subnet and then make sure that the router, routes through Proxychains.

Also make sure you run YAPH through Proxychains Smiley
Logged
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!