perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #30 on: November 27, 2007, 12:22:08 PM » |
|
Nope I getcha now - I am getting the screwed up chars instead of the "pretty" apostrophes - I don't remember what we did in posts ^^above so that we saw them correctly - are you encoding/entitying or something? When I scraped a site with pretty apostrophes and simply passed the HTML on it rendered correctly... it was when I did stuff to it (encoding etc) that it got munged...
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
nutballs
|
 |
« Reply #31 on: November 27, 2007, 02:11:31 PM » |
|
all im doing is using the most recent class from this thread, and doing this code: <?php require_once("inc/webrequest2.class.php"); $req = new WebRequest2(); echo $req->simpleGet('http://www.ipodnews.biz/2006/06/28/'); ?>
thats it, nothing more. It obviously is a server setting somehow. Probably in how PHP is compiled. I have no control over it of course though. So i wonder if there is a way override the encoding that PHP uses. im sure there is. GAH!!!!!!!!!!!!!!!!!!!! it is. LOL I added: <?php header('Content-Type: text/html; charset=utf-8'); ?> apparently adding the meta version to the page doesnt work <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> though this only helps for display. The characters still get processed fucked up, and stored in the dB wrong Im guessing. But i will test further to make sure. btw a page about it all is here: http://www.phpwact.org/php/i18n/charsets
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #32 on: November 27, 2007, 02:19:25 PM » |
|
Ah, this makes some sense - the header that came back to you had that in it, but when you kick it back out to the original caller a new header is being created - so that's why it's not encoding correctly on the receiving end.
Note that storing the HTML in a database will not change it - so long as you add that header on the way back out to the surfer it'll decode correctly.
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
nutballs
|
 |
« Reply #33 on: November 27, 2007, 02:24:44 PM » |
|
yep. its correctly storing it in the DB, though phpadmin shows it bad as well. but as long as the header is set, UTF-8, from PHP when I spit it out, it works fine.
makes sense.
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #34 on: November 27, 2007, 04:28:07 PM » |
|
Now that's interesting - I have some pages with funkiness (pretty apos, pretty quotes etc) that look just fine in phpMyAdmin... I assume you're looking at a pretty recent version...
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
ratthing
|
 |
« Reply #35 on: November 28, 2007, 10:57:12 PM » |
|
If you're using MySQL, check the db encoding as well. It many cases it's defaulted to Latin-1, which results in the db collation being set to Latin-Swedish-ci by default on new dbs. It's a fairly well-known problem that still hasn't been fixed in a lot of Linux distros. And Perk, thanks for the WebRequest code. I picked up a PHP & MySQL book from the library and have been fiddling around some more. Of course, I've also been cribbing code from various places. We'll see if any of it sticks.  =RT=
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #36 on: November 28, 2007, 11:23:21 PM » |
|
No worries lad. I'm gonna post an update soon as well - cuppla new features...
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
vsloathe
|
 |
« Reply #37 on: November 29, 2007, 08:16:11 AM » |
|
Perk, can you figure out how curl_multi works and make your class pseudo-multi-threaded?  Probably too much to ask, but I just started using curl_multi for guestbooks and trackbacks and it rocks faces (to understate how awesome it is).
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #38 on: November 29, 2007, 08:54:08 AM » |
|
'Twould simply be easier for you to create your own threads, and instantiate a new instance of the class in each and I think you'd be good to go. The only thing that would not be understandable would be debugMode=WRD_ECHO because the output lines would be intermixed... so you'd want to create a new file for each instance if you want to watch debug info. Other than that, since there's no shared memory or files, I think you'd be good to go UNLESS the fread() functions are not threadsafe, in which case you're just screwed 
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
meme
|
 |
« Reply #39 on: November 30, 2007, 09:48:56 AM » |
|
I didnt look much at the class but I like the onSuccess,onFailure,before,after callbacks. I'm gonna implement that in my Curl class soon. Any ideas on how to pseudo-thread the callbacks so you process responses in batches?
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #40 on: November 30, 2007, 10:05:20 AM » |
|
that confused me... the point of the event pops would be to handle the responses one-by-one as they come in, rather than in batches... could you flesh that out a bit more for me?
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
meme
|
 |
« Reply #41 on: November 30, 2007, 01:03:50 PM » |
|
You might call them events but they don't fire off asynchronously. If my request pool has 10 items and my 'after' callback takes 5 min each, then instead of 50min it would take 5 min and 10 parallel processes/threads/forks etc. Now I know you don't have a request pool in your class, but I do (and you could too with socket_select()). By the way, your cookie parser will break when there's no ';' to explode, check mine below ($str is the contents of the set-cookie header): protected function parseCookie($str) { if( strpos($str, ';') === false) { $c = explode('=',$str); $parts['name'] = trim($c[0]); $parts['value']= trim($c[1]); } else { $cookiesplit = explode( ';', $str ); $parts = array();
foreach( $cookiesplit as $data ) { $c = explode( '=', $data ); $c[0] = trim( $c[0] );
if( in_array( $c[0], array( 'domain', 'expires', 'path', 'secure', 'comment' ) ) ) { switch($c[0]) { case 'expires': $c[1] = strtotime( $c[1] ); break; case 'secure': $c[1] = true; break; } $parts[$c[0]] = $c[1]; } else { $parts['name'] = $c[0]; $parts['value']= $c[1]; } } } if( !empty($parts['name']) ) { return array($parts['name'],$parts['value']); } else { return false; } }
|
|
|
|
|
Logged
|
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5230
:sniffle: Humor was so much easier before.
|
 |
« Reply #42 on: January 04, 2008, 11:27:01 PM » |
|
Latest Update: Fixed some bugs in the onSuccess and onFailure event pops. Enjoy! <?php
class webRequest2 { private $socket;
protected $finalURL; protected $rawContent; protected $rawHeader; protected $rawResponse; protected $chunkedLength; protected $chunkedTransfer; protected $cookies; protected $cookieStr; protected $errorFlag; protected $getList; protected $headers; protected $postList; protected $postStr; public $accept; public $charSet; public $domain; public $debugLogFile; public $debugLogClearOnDispatch; public $debugMode; public $language; public $manualPostContent; public $method; public $port; public $postMode; public $proxy; public $redirect; public $resultCode; public $timeout; public $url; public $userAgent; public $useSSL; // Event Handlers public $onFailure; public $onProxyRetry; public $onSuccess;
// Protected and special functions function webRequest2() { $this->reset(); preg_match('/^([0-9])/', phpversion(), $parts); $this->ancient = ($parts[1] < '5'); $this->userAgent = 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8'; $this->accept = 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'; $this->charSet = 'ISO-8859-1,utf-8:q=0.7,*;q=0.7'; $this->language = 'en-us,en;q=0.5'; if (!defined('WRD_OFF')) { define('WRD_OFF', 0); define('WRD_ECHO', 1); define('WRD_LOG', 2); define('WRM_GET', 0); define('WRM_POST', 1); define('WRP_NORMAL', 0); define('WRP_MULTIPART', 1); } $this->debugMode = WRD_OFF; $this->postMode = WRP_NORMAL; $this->debugLogFile = ''; $this->debugLogClearOnDispatch = true; $this->timeout = 30; $this->useSSL = false; $this->proxy = ''; } protected function buildCookieStr() { $cookieStr = ''; $start = true; foreach($this->cookies as $name=>$value) { if (!$start) { $cookieStr .= '; '; } $cookieStr .= "$name=$value"; $start = false; } $this->debug("Built COOKIE String: $cookieStr"); return $cookieStr; } protected function buildGetStr() { $getStr = ''; $getCount = count($this->getList); if ($getCount) { $sepStr = '?'; foreach($this->getList as $name=>$value) { $value = urlencode($value); $getStr .= "$sepStr$name=$value"; $sepStr = '&'; } } $this->debug("Built GET String: $getStr"); return $getStr; } protected function buildPostStr() { if ($this->manualPostContent) return $this->manualPostContent; $postStr = ''; $postCount = count($this->postList); if ($postCount) { $sepStr = ''; foreach($this->postList as $name=>$arr) { $value = urlencode($arr['content']); $postStr .= "$sepStr$name=$value"; $sepStr = '&'; } } else { $postStr = 'No Content'; } $this->debug("Built POST String: $cookieStr"); return $postStr; } protected function buildHeader() {
$header[0] = ''; // place holder for first line of header $header[] = "Host: {$this->domain}"; $header[] = "User-Agent: {$this->userAgent}"; $header[] = "Accept: {$this->accept}"; $header[] = "Accept-Language: {$this->language}"; $header[] = "Accept-Encoding: "; $header[] = "Accept-Charset: {$this->charSet}"; if ($this->hasCookies()) { $header[] = "Cookie: {$this->buildCookieStr()}"; } $header[] = "Connection: close";
$hostStr = ($this->proxy) ? "http://{$this->domain}" : ''; switch($this->method) { case 'get': case 'GET': $header[0] = "GET $hostStr{$this->finalURL} HTTP/1.1"; $header[] = ''; $header[] = "Content-Type: text/html"; $header[] = "Content-Length: 0"; $header[] = ''; break; case 'post': case 'POST': if (count($this->postList) == 0) $this->postMode = WRP_NORMAL; $header[0] = "POST $hostStr{$this->finalURL} HTTP/1.1"; switch ($this->postMode) { case WRP_NORMAL: $postData = $this->buildPostStr(); $requestLen = strlen($postData); $header[] = "Content-Type: application/x-www-form-urlencoded"; $header[] = "Content-Length: $requestLen"; $header[] = ''; $header[] = $postData; break; case WRP_MULTIPART: $boundary = time() . time(); $postData = $this->buildMultipartPostStr($boundary); $requestLen = strlen($postData); $header[] = "Content-Type: multipart/form-data; boundary=$boundary"; $header[] = "Content-Length: $requestLen"; $header[] = ''; $header[] = "$postData"; break; default: $this->debug("buildHeader: Terminal failure - unknown postMode '{$this->postMode}'"); | | |