The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 23, 2019, 10:22:42 AM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: How to remove backslashes (escape sequence) from Curl Output?  (Read 7100 times)
hvshah69
Rookie
**
Offline Offline

Posts: 21


View Profile
« on: June 17, 2008, 05:15:19 PM »

Hi,

Whenever I try to grab a webpage using Curl functions, the html code is filled with backslashes. For example a double quote is replaced with \" and new line is replaced with \n and so on.

I tried using stripslashes() function but that did not do anything Huh?

any suggestions?

Thanks.
Logged
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #1 on: June 17, 2008, 07:23:19 PM »

Welcome to the cache hvshah69.

I'd say its unlikely curl is your problem here.

Can you post a test case (minimum code required to reproduce your problem)?

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: June 17, 2008, 08:54:26 PM »

Welcome HV - ditto TheDarkness - I think we need to see more before we could make a blanket assumption.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
hvshah69
Rookie
**
Offline Offline

Posts: 21


View Profile
« Reply #3 on: June 17, 2008, 10:51:41 PM »

Thanks guys for the warm welcome.

Here is the code in its most basic form:
Quote
<?php

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://www.cnn.com/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$output = curl_exec($ch);
   
curl_close($ch);

echo $output;

?>

and Here is partial output.

Quote
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\"http://www.w3.org/TR/html4/loose.dtd\"><html lang=\"en\"><head><title>CNN.com - Breaking News, U.S., World, Weather, Entertainment & Video News</title>      \n<meta http-equiv=\"refresh\" content=\"1800;url=?refresh=1\">\n<meta name=\"Description\" content=\"CNN.com delivers the latest breaking news and information on the latest top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN.com provides special reports, video, audio, photo galleries, and interactive guides.\">\n<meta name=\"Keywords\" content=\"CNN, CNN news, CNN.com, CNN TV, news, news online, breaking news, U.S. news, world news, weather, business, CNN Money, sports, politics, law, technology, entertainment, education, travel, health, special reports, autos, developing story, news video, CNN Intl\">\n\n<link rel=\"alternate\" type=\"application/rss+xml\" title=\"CNN - Top Stories [RSS]\" href=\"http://rss.cnn.com/rss/cnn_topstories.rss\">\n<link rel=\"alternate\" type=\"application/rss+xml\" title=\"CNN - Recent Stories [RSS]\" href=\"http://rss.cnn.com/rss/cnn_latest.rss\">\n\n<meta http-equiv=\"content-type\" content=\"text/html; charset=iso-8859-1\">\n\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://i.cdn.turner.com/cnn/.element/css/2.0/common.css\" />\n\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://i.cdn.turner.com/cnn/.element/css/2.0/main.css\" />\n\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://i.cdn.turner.com/cnn/.element/css/2.0/pgaleader.css\" />\n\t<script>\n    if(window.top!=window.self)\n    {\n        if(document.cookie.indexOf('rfrshchck=')!=-1)\n        {\n            window.top.location=window.self.location;\n        }\n        document.cookie='rfrshchck=1';\n    }\n\t</script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/scripts/prototype.js\" type=\"text/javascript\"></script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/scripts/scriptaculous.js?load=effects\" type=\"text/javascript\"></script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/csiManager.js\" type=\"text/javascript\"></script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/StorageManager.js\" type=\"text/javascript\"></script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/local.js\" type=\"text/javascript\"></script>\n\t<script src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/main.js\" type=\"text/javascript\"></script>\n\n<link rel=\"apple-touch-icon\" href=\"http://i.cdn.turner.com/cnn/apple-touch-icon.png\"/>\n\n<script type=\"text/javascript\" language=\"javascript\">\npagetypeTS='homepage';\n\nvar overrideVideoAd = '/cnn_adspaces/2.0/homepage/video.postroll_emb.ad';\n</script>\n<style type=\"text/css\">\n<!--\n.cnnElexRFoot p\n{position:relative;}\n.cnnElexRFoot p span\n{position:absolute;right:0;}\n.cnnElexPrimary\n{position:relative;}\n.cnnElexPrimary span a\n{position:absolute;right:0;top:5px;font-size:10px;}\n* html .cnnElexPrimary span a\n{right:20px;}\n* html .cnniReportBox .cnniReportMoreMain a\n{width:302px;}\n-->\n</style>\n<script type=\"text/javascript\">\nvar countyLinks ={\n\t\"SD\":{\"url\":\"/ELECTION/2008/primaries/results/county/#SDDEMMAPPRIMARY1 \",\"text\":\"County results map\"},\n\t\"MT\":{\"url\":\"/ELECTION/2008/primaries/results/county/#MTDEMMAPPRIMARY1 \",\"text\":\"County results map\"}\n\t}\n\nvar pollsToClose = [\n\t{\"state\":\"South Dakota\",\"stateCode\":\"SD\",\"closingTime\":\"Last polls close 9:00 p.m. ET\"},\n\t{\"state\":\"Montana\",\"stateCode\":\"MT\",\"closingTime\":\"Last polls close 10:00 p.m. ET\"}\n]\n\n</script>\n\n<script language=\"JavaScript\" type=\"text/javascript\">var cnnCurrTime = new Date(1213767836572); var cnnCurrHour = 1; var cnnCurrMin = 43; var cnnCurrDay='Wed';</script>\n<script type=\"text/javascript\">\n\tvar cnnDocDomain = '';\n\tif(location.hostname.indexOf('cnn.com')>0) {cnnDocDomain='cnn.com';}\n\tif(location.hostname.indexOf('turner.com')>0) {if(document.layers){cnnDocDomain='turner.com:'+location.port;}else{cnnDocDomain='turner.com';}}\n\tif(cnnDocDomain) {document.domain = cnnDocDomain;}\n\t// DO NOT PUT ANYTHING BENEATH THIS!\n</script>\n\n\n<script type=\"text/javascript\" src=\"http://i.cdn.turner.com/cnn/.element/js/2.0/ad_head0.js\"></script>\n<script type=\"text/javascript\" src=\"http://i.cdn.turner.com/cnn/cnn_adspaces/cnn_adspaces.js\"></script>\n</head><body id=\"cnnMainPage\"><div id=\"cnnOpacity\" onclick=\"cnnHideMoPo();\"></div>      <div id=\"cnnHeader\">\r\n\t<div class=\"cnnHeaderContent\">\r\n\t\t<div class=\"cnnHeaderCeiling\">\r\n\t\t\t\n<a href=\"/\"><img src=\"http://i.cdn.turner.com/cnn/.element/img/2.0/global/nav/header/header_cnn_com_logo.gif\" width=\"148\" height=\"36\" border=\"0\" alt=\"\">[/url]\r\n\t\t\t\t\t\t<div class=\"cnnHeadColRight\">\n\t\t\t\t<div class=\"cnnGlobalHeaderSections\" id=\"cnnHeadSrchTypeArea\"><span class=\"cnnSearchLabel\">Web</span> | <a href=\"javascript:cnnUpdateSrchType('news');\">CNN News[/url] | <a href=\"javascript:cnnUpdateSrchType('video');\">CNN Videos[/url]</div>\n\t\t\t\t<div class=\"cnnGlobalHeaderSearch\">\n\t\t\t\t\t<form action=\"http://search.cnn.com/cnn/search\" method=\"get\" onsubmit=\"return cnnSearch(this);\">\n\t\t\t\t\t\t<input type=\"hidden\" name=\"cnnHeadSrchType\" id=\"cnnHeadSrchType\" value=\"web\">\n\t\t\t\t\t\t<input type=\"text\" maxlength=\"40\" class=\"cnnHeaderTxtField\" id=\"cnnHeadSrchTxt\">\n\t\t\t\t\t\t<input type=\"image\" src=\"http://i.cdn.turner.com/cnn/.element/img/2.0/global/nav/header/header_search_btn.gif\" alt=\"Submit\" class=\"cnnHeaderSearchBtn\">\n\t\t\t\t\t\t<img src=\"http://i.cdn.turner.com/cnn/.element/img/2.0/global/nav/header/header_google_logo.gif\" class=\"cnnSrchDomLogo\" width=\"47\" height=\"22\" border=\"0\" alt=\"\">\n\t\t\t\t\t</form>\n\t\t\t\t</div>\n\t\t\t</div>\r\n\t\t</div>\r\n\t</div>\r\n\t\t<div class=\"cnnNavStretch\">\r\n\t\t<div class=\"cnnHeaderNav\">\r\n\t\t\t<ul class=\"cnnNavigation\">\r\n\t\t\t\t<li class=\"cnnNavLeft\"></li>\r\n\t\t\t\t<li><a class=\"cnnCurPage\" href=\"/\">Home[/url]</li>\r\n\t\t\t\t<li><a href=\"/WORLD/\">World[/url]</li>\r\n\t\t\t\t<li><a href=\"/US/\">U.S.[/url]</li>\r\n\t\t\t\t<li><a href=\"/POLITICS/\">Politics[/url]</li>\r\n\t\t\t\t<li><a href=\"/CRIME/\">Crime[/url]</li>\r\n\t\t\t\t<li><a href=\"/SHOWBIZ/\">Entertainment[/url]</li>\r\n\t\t\t\t<li><a href=\"/HEALTH/\">Health[/url]</li>\r\n\t\t\t\t<li><a href=\"/TECH/\">Tech[/url]</li>\r\n\t\t\t\t<li><a href=\"/TRAVEL/\">Travel[/url]</li>\r\n\t\t\t\t<li><a href=\"/LIVING/\">Living[/url]</li>\r\n\t\t\t\t<li class=\"offsite\"><a href=\"http://money.cnn.com/?cnn=yes\">Business[/url]</li>\r\n\t\t\t\t<li class=\"offsite\"><a href=\"/si/?cnn=yes\">Sports[/url]</li>\r\n\t\t\t\t<li class=\"offsite\"><a href=\"/time/\">Time.com[/url]</li>\r\n\t\t\t</ul>\r\n\t\t\t\t\t\t<ul class=\"cnnUtilityNavigation\">\r\n\t\t\t\t<li class=\"cnnVideo\"><a href=\"/video/?iref=videoglobal\">Video[/url]</li>\r\n\t\t\t\t<li class=\"cnnIreport\"><a href=\"/exchange/?iref=ireportglobal\">iReport[/url]</li>\r\n\t\t\t\t<li class=\"cnnImpact\"><a href=\"/SPECIALS/2007/impact/?

So, as you see the $output is full of backslash characters (\", \r, \n, \t etc) that do not exist (or are present as whitespace characters) in the actual html source.
« Last Edit: June 17, 2008, 10:55:24 PM by hvshah69 » Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: June 17, 2008, 11:00:01 PM »

Well, I'm not sure shy the code is all coming across escaped like that ... I think there may be something else in play here.

But, to sort it out, this should do you:

$search = array('\t', '\r', '\n', '\"');
$replace = array("\t", '', "\n", '"');
$buff = str_replace($search, $replace, $buff);

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #5 on: June 18, 2008, 06:36:27 AM »

errr  is it returning it as a javascript escaped chunk? It looks like how you would escape HTML to be displayed via document.write. Im probably totally off there, but just something i noticed from your chunk, and the fact that I just came out of JS/encoding hell last night.

does that happen with any site? if just CNN, and you are not setting your agent to something standard like firefox, then CNN might be returning something wacky for you.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
hvshah69
Rookie
**
Offline Offline

Posts: 21


View Profile
« Reply #6 on: June 18, 2008, 09:19:39 AM »

This happens with all the sites. I removed the user-agent part from the code but it does it regardless of what setting I have on the user agent.

I am running this code on windows machine using Zend Studio. Can it have something to do with this?

Anything I should check in php.ini file?

Thanks for all the responsies.
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #7 on: June 18, 2008, 11:19:05 AM »

i wonder if magic quotes slashes whatever the hell that stupid thing is, is getting in the way?
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #8 on: June 18, 2008, 12:25:42 PM »

AFAIK magic quotes only affects quotes ... that's why I was wondering if there's something else at play here - that's been through some form of processor other than stock PHP / cURL.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #9 on: June 18, 2008, 06:17:47 PM »

Some sort of weird transparent proxy interference maybe?

Try this, open a command prompt and type telnet. Once you are in telnet type "set localecho" and then;

Code:
open www.cnn.com 80

then type exactly:

Code:
GET http://www.msn.com/ HTTP/1.1
Host: localhost
followed by enter twice

Tell us whether the html code is escaped in what you see.

Cheers,
td
« Last Edit: June 18, 2008, 06:19:32 PM by thedarkness » Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #10 on: June 18, 2008, 07:27:57 PM »

See? See? That's why TheDarkness makes the big money, right there.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #11 on: June 19, 2008, 01:01:37 AM »

  Smooch
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
hvshah69
Rookie
**
Offline Offline

Posts: 21


View Profile
« Reply #12 on: June 19, 2008, 04:05:38 PM »

Thanks thedarkness for excellent suggestion.

I will try this as soon as I go home and report the results.

I have just completed building a Ubuntu server box from an old PC. I am going to try this code from PHP-CLI and see what I get as well

Logged
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #13 on: June 19, 2008, 06:10:50 PM »

Thanks thedarkness for excellent suggestion.

I will try this as soon as I go home and report the results.

I have just completed building a Ubuntu server box from an old PC. I am going to try this code from PHP-CLI and see what I get as well



That will tell us as well as your code works as expected right out of the box on one of my linux ststems.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
hvshah69
Rookie
**
Offline Offline

Posts: 21


View Profile
« Reply #14 on: June 20, 2008, 07:17:41 AM »

Thedarkness,

I am not getting any output using your telnet method. Here is the screen capture from the linux console

Code:
hiren@acer:~$ telnet
telnet> open www.cnn.com 80
Trying 64.236.91.23...
Connected to www.cnn.com.
Escape character is '^]'.
GET http://www.cnn.com/ HTTP/1.1
Host: localhost

I tried to hit enter twice as you said but nothing happens.
Logged
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!