The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 20, 2019, 11:39:59 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Curl login problem - Help needed!!  (Read 4066 times)
patch
Rookie
**
Offline Offline

Posts: 32


View Profile
« on: August 03, 2011, 09:46:34 AM »

Hi,

Well, with help, I managed to sort out my scraping problems ... got another problem now though.

Trying to use curl to log into racingpost.com so I can scrape info only available to those who sign in.

I'm blagging it a bit though as the form submits seem to be done in javascript and I think I've got the correct post url  but not sure. And, of course, I'm useless with curl! Sad

Any help greatly appreciated.

Here's my code:

Code:
<?php 

# Go to site home page - just in case cookies are set
$ref="www.racingpost.com";
$target="http://www.racingpost.com";
$post_data="";
$x=Curlit($target,$post_data,$ref);

#Set up login data
$post_data="in_un=wellyfish&in_pw=wellyfish99&process=IN&logInType=lightbox&PARGS=&protoSecure=0"#post parms
$target="https://reg.racingpost.com/modal_dialog/login.sd?protoSecure=0"#login url
$login=Curlit($target$post_data$ref); #try to login

# Re-scrape home page to see if logged in successfully
$target="http://www.racingpost.com";
$html="";
$post_data="";
$x=Curlit($target,$post_data,$ref);
print_r($x);


function 
Curlit($rpcurl,$request$ref)
{
$cookie "cookies.txt";
$ch curl_init();
curl_setopt($chCURLOPT_POSTFIELDS$request);
curl_setopt($chCURLOPT_URL$rpcurl);
curl_setopt($chCURLOPT_RETURNTRANSFER1);
curl_setopt($chCURLOPT_TIMEOUT1);
curl_setopt ($chCURLOPT_USERAGENT"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($chCURLOPT_TIMEOUT10);
curl_setopt ($chCURLOPT_FOLLOWLOCATION1);
curl_setopt ($chCURLOPT_RETURNTRANSFER1);
curl_setopt ($chCURLOPT_COOKIEJAR$cookie);
curl_setopt ($chCURLOPT_COOKIEFILE$cookie);
curl_setopt ($chCURLOPT_REFERER$ref);

$html=curl_exec($ch);
curl_close($ch);
return $html;
}

also, I'm runnig the script from my pc and I can't find the cookies.txt file anywhere  Embarrassed

Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: August 03, 2011, 11:00:21 PM »

Cookies are simply a name/value pair that will be in the header of a document returned to you, and the browser is obliged to send them back up (there are some caveats and gotchas there, like subdomain specifiers and directories and such, but that will do as an explanation).

Simply put, you need to see what cookies are sent to you on each page and send them back up to the server. I am schnit with cURL as well, so I don't know where they come back, but you'll need to find (that) array and probably save it somehow. Perhaps just serialize it and store to disk - or even just keep in memory if this scraper does not need to be persistent.

The only problem with cURL in a case like this is that a LOT is going on that you can't see, which can make triage a trial. That's literally why I wrote my web class (you can find it here on the board) - so that I could have utterly granular, transparent and dissectable transmissions. The coders of the site will not have worked to make it easy for you (at the least) and almost certainly have made the login/cookie/JS process just a little extra convoluted to make this effort more difficult.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
patch
Rookie
**
Offline Offline

Posts: 32


View Profile
« Reply #2 on: August 04, 2011, 04:25:43 AM »

Thanks Perk,

I'll see what I can do with your webclass.
Logged
AlecSimpson
n00b
*
Offline Offline

Posts: 1


I am a simple kind of person.


View Profile
« Reply #3 on: January 30, 2012, 05:17:26 AM »

This above coding is in which programming language? Can anybody tell me?
Logged

No links in sigs.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: January 30, 2012, 06:41:51 AM »

I am not a php guy, but looking at your code

function Curlit($rpcurl,$request, $ref)

I am wondering about "$rpccurl".  It looks like a variable, but I do not
see where it is set. Also, rpc sometimes means Remote Procedure Call
and I do not think that's what you would want.

Maybe you should have $target there?

Also, after every request, can you save to a file whatever the
server is sending back to you?


'night,
Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
serialnoob
Journeyman
***
Offline Offline

Posts: 88


View Profile
« Reply #5 on: March 05, 2012, 04:44:34 PM »

@patch

The only problem with cURL in a case like this is that a LOT is going on that you can't see, which can make triage a trial.

This includes not setting yr own CURLOPT_HEADER, in which case it default to the "curl" one IFIAK.
You presumably want to simulate a browser behaviour without sending the coresponding header maybe.
Another one is CURLOPT_ENCODING wich simulate compression as modern browsers do.
my 2cts
Logged

Success consists of going from failure to failure without loss of enthusiasm - Winston Churchill
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!