The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 20, 2019, 11:35:33 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: scraping problem  (Read 10495 times)
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« on: June 18, 2012, 04:14:19 AM »

hi im scraping this site http://www.cleartrip.com/ with following code but it giving me not data
$target ='http://www.cleartrip.com/flights/results?from=CCU&to=DEL&depart_date=22/06/2012&adults=1&childs=0&infants=0&dep_time=0&class=Economy&airline=&carrier=&x=57&y=16&flexi_search=yes&tb=n';

   $data=file_get_contents($target);
   echo $data;
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: June 18, 2012, 11:29:14 PM »

I'll have to look from a real machine (not my pad) in the morning, but it looks as though the results are being delivered via AJAX or another out of band method. If this is the case then the page (pulled as you have) would never show results.

I'll check it out tomorrow and comment further. Might have an idea how to get around it.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #2 on: June 19, 2012, 11:04:40 AM »

ok thx alot for the help i try with a differnt method too all other websites that i scrap it give me required result but this website cleartip.com isnt giving me anything im new to scrapping world so im confused tried with <textarea> too but no success
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: June 19, 2012, 01:08:39 PM »

The textarea tip in this case will not help you.

Are you familiar with AJAX?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: June 19, 2012, 04:01:03 PM »

Just a thought to use their mobile page:

If this is the data you need:
Code:
Get instant access to Cleartrip on your BlackBerry. Download the Cleartrip App
Cleartrip logo
Kolkata - New Delhi
Fri, 22 Jun | 1 adult
modify | filter
sort by: price | time
Select a flight (1-10 of 120)
IndiGo
07:05 - 09:20
2h 15m, non-stop , 6E 212
Rs. 7,841
Select
SpiceJet
07:20 - 09:30
2h 10m, non-stop , SG 608
Rs. 7,841
Select
IndiGo
08:45 - 10:55
2h 10m, non-stop , 6E 228
Rs. 7,841
Select
IndiGo
11:45 - 14:00
2h 15m, non-stop , 6E 236
Rs. 7,841
Select
SpiceJet
12:00 - 14:10
2h 10m, non-stop , SG 255
Rs. 7,841
Select
GoAir
14:35 - 16:45
2h 10m, non-stop , G8 712
Rs. 7,841
Select
IndiGo
16:40 - 18:55
2h 15m, non-stop , 6E 206
Rs. 7,841
Select
SpiceJet
17:10 - 19:20
2h 10m, non-stop , SG 219
Rs. 7,841
Select
IndiGo
20:35 - 22:45
2h 10m, non-stop , 6E 224
Rs. 7,841
Select
Air India
07:00 - 09:05
2h 5m, non-stop , AI 763
Rs. 8,072
Select
1 - 10 of 120 flights prev | next
Home | Flights | Trains | Trips
Sign in to your Cleartrip account
Cleartrip's desktop site
2006 - 2012 Cleartrip
Just a moment...

All I did was add "android" to my usual user agent string.


Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #5 on: June 19, 2012, 08:52:38 PM »

yeah i want this info but im not geting how u getng it?Huh?
Logged
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #6 on: June 19, 2012, 09:14:23 PM »

@Bompa can you please tell me where you add android?Huh? i added the user agent in firefox? how can i add android to string?
Logged
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #7 on: June 19, 2012, 09:35:48 PM »

@ bompa i also get this thing but problem is i want to get this info through php code not by any user agent
Logged
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #8 on: June 19, 2012, 09:59:31 PM »

What is useragent?
http://www.google.com/search?hl=en&safe=off&site=&source=hp&q=user+agent+string&oq=user+agent+string&aq=0&aqi=g10&aql=&gs_l=hp.1.0.0l10.1195.1195.0.2561.1.1.0.0.0.0.237.237.2-1.1.0...0.0.5YvS6A7KLaQ


How to change useragent?
http://www.google.com/search?hl=en&safe=off&site=&source=hp&q=how+to+change+useragent+string&oq=how+to+change+useragent+string&aq=f&aqi=g-l3g-lK7&aql=&gs_l=hp.3..0i13l3j0i13i30l7.1932.8383.0.9026.34.25.2.7.7.1.338.2860.9j15j0j1.25.0...0.0.4VoF_NsTVPw

Afaik, To do this with php code, is not as easy as just getting a page like before.

If you really need it, you have to research "scraping with curl"

http://www.google.com/search?hl=en&safe=off&q=scraping+with+curl&oq=scraping+with+curl&aq=f&aqi=g1g-m1g-mK5g-bK2&aql=&gs_l=serp.3..0j0i5j0i5i30l5j0i8i30l2.289288.292623.0.293104.18.17.0.0.0.0.349.2299.8j6j2j1.17.0...0.0.FnkrFJRDOFk


Bompa

PS: It is most important for you to use Google.  It answers immediately,
which is better than waiting for us.
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #9 on: June 19, 2012, 10:40:29 PM »

i tried with this code
$url = "http://www.cleartrip.com/flights/results?from=CCU&to=DEL&depart_date=22/06/2012&adults=1&childs=0&infants=0&dep_time=0&class=Economy&airline=&carrier=&x=57&y=16&flexi_search=no&tb=n";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page ;

but 2nd page info about flights are not displayed evry thing else are being displayed i dnt know wts the matter is???

P.S i am not new to computer so i use to google my problem but im new to scraping thats why em bothering you
Logged
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #10 on: June 20, 2012, 12:02:10 AM »

Try this:

Code:
$url = "http://www.cleartrip.com/m/flights/results?from=CCU&to=DEL&depart_date=22/06/2012&adults=1&childs=0&infants=0&dep_time=0&class=Economy&airline=&carrier=&x=57&y=16&flexi_search=no&tb=n";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "android Mozilla/5.0 (Linux; U; Android 0.5; en-us)");
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page ;

$fh = fopen( "SCRAPEDCONTENT.HTML", 'w');
fwrite($fh, "$curl_scraped_page\n");
fclose($fh);


Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #11 on: June 20, 2012, 12:18:31 AM »

thx alot......................................getng evry thing you are genious
Logged
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #12 on: June 20, 2012, 03:37:02 AM »

i realy appreciate about the help but im struck with this http://www.expedia.co.in/  website it have no mobile version plus it is also java and ajax based
i tried previous code that you suggested it work ecellent for cleartip.com but with this website no help
Logged
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #13 on: June 20, 2012, 10:46:49 PM »

one thing more if you reply that in cleartrip.com wiht this
"https://www.cleartrip.com/m/flights/results?from=CCU&to=DEL&depart_dd=22&depart_mmyyyyy=06%2F2012&return_dd=24&return_mmyyyyy=06%2F2012&adults=1&childs=0&infants=0&mobile=true&class=Economy&carrier=&dep_time=0&ret_time=0&airline_codes=ALL"
 address im not geting the roundtrip full information..............isnt any way to get data from these ajax or java based websites?Huh?? please help me out....
Logged
mashal
Rookie
**
Offline Offline

Posts: 11


View Profile
« Reply #14 on: June 21, 2012, 05:40:14 AM »

no answer..................................Huh?Huh?Huh?Huh?Huh?Huh?Huh?
im stuck with post method of CURL................................

$url = "https://www.cleartrip.com/m/flights/itinerary/68f7a81c4b-6cb8-4ebe-889d-250c956b399e/info";

            $sid='a8680776-7db7-4b9d-970c-83d7b8ebd260';
            $rnd_one='O';
            $from='DEL';
            $to='AMD';
            $depart_date='22/06/2012';
            $adults='1';
            $childs='0';
         $infants='0';
         $dep_time='0';
         $class='Economy';
         $airline='';
         $carrier='';
         $timestamp='';
         $companyid='110340';
         $source='MOBILE';
         $fromCityName='';
         $toCityName='';
         $preferred_flights='ALL,';
         $preferred_time='';
         $BIZ_ACTION_MODE='VIEW_ORDER_CAPTURE';
         $out_fare_key='supp_AMADEUS|si-a8680776-7db7-4b9d-970c-83d7b8ebd260|fk_S2_4793_1340366100000_M2SIP,V2IPJK_true_fk_9W-K_2514_1340379300000_M2SIP,V2IPJK_true_true';
         $out_price='9698';
         $out_no_legs='2';
         $out_leg_aircode_1='S2';
         $out_leg_aircode_2='9W-K';
     

//url-ify the data for the POST
//$fields_string='';
//foreach($fields as $key=>$value)
//{ $fields_string .= $key.'='.$value.'&'; }
//rtrim($fields_string,'&');

$ch = curl_init($url);
//print_r($ch);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_POSTFIELDS    ,$sid.$rnd_one. $from.$to. $depart_date.$adults.$childs.$infants.$dep_time.$class.$airline.$carrier.$timestamp.$companyid.$source.$fromCityName.$toCityName.$preferred_flights.$preferred_time.$BIZ_ACTION_MODE.$out_fare_key.$out_price.$out_no_legs.$out_leg_aircode_1.$out_leg_aircode_2);
curl_setopt($ch, CURLOPT_USERAGENT, "android Mozilla/5.0 (Linux; U; Android 0.5; en-us)");
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page ;
page shows empty..........................
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!