Howdy folks, here I am with yet another n00b question that is really busting my balls.
A quick overview of the situation. I am using php/curl to try automate a webform.
The captcha url is always static ie.
http://site.com/act/captcha.jpgThere is a token on the register page that also needs to be scraped.
However my problem lies herein. I scrape the register page and parse the token number. I then scrape the captcha url alas all my efforts are for naught as this image is not the same as the one originally displayed on the register page. I think I verified this as when I right click and view image on the original captcha it also changes the image when displayed.
What I think is happening is that when I request the captcha after already scraping the register page it is altering the cookie and giving me a new image. here is a quick sampling of the function I am using, the scrape_page function is shown below:
function get_token($site)
{
// delete any remaining cookies
if(file_exists("cookies/cookies.tmp"))
unlink("cookies/cookies.tmp");
$register_page = scrape_page($site, "https://site.com/act/register");
preg_match("/name=\"token\" value=\"(.*?)\" \/>/", $register_page, $matches);
preg_replace("/name=\"token\" value=\"/", "", $matches[0]);
$token = $matches[1];
$captcha_url = "https://site.com/act/Captcha.jpg";
$fp = fopen("captcha/captcha.jpg", "w");
fwrite($fp, scrape_page($captcha_url, "https://site.com/act/register"));
fclose($fp);
return $token;
}
function scrape_page($page, $reffer)
{
// cookie path
$file_cookie = "cookies/cookies.tmp";
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)");
$response = curl_exec($ch);
curl_close($ch);
//echo curl_error($ch);
return $response;
}
Major kudos to anyone who can help me solve this.
ps. wasn't sure to post this in the php section or here so to the powers that be you may move it if you deem necessary.