The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. December 05, 2008, 10:47:35 AM

Login with username, password and session length


Pages: 1 2 [3] 4
  Print  
Author Topic: Perk's NEW WebRequest Class  (Read 3423 times)
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #30 on: November 27, 2007, 12:22:08 PM »

Nope I getcha now - I am getting the screwed up chars instead of the "pretty" apostrophes - I don't remember what we did in posts ^^above so that we saw them correctly - are you encoding/entitying or something? When I scraped a site with pretty apostrophes and simply passed the HTML on it rendered correctly... it was when I did stuff to it (encoding etc) that it got munged...
Logged

If I can't be Mr. Root then I don't want to play.
nutballs
Administrator
Lifer
*****
Online Online

Posts: 3456


View Profile
« Reply #31 on: November 27, 2007, 02:11:31 PM »

all im doing is using the most recent class from this thread, and doing this code:

Code:
<?php 
require_once("inc/webrequest2.class.php"); 
$req = new WebRequest2();
echo 
$req->simpleGet('http://www.ipodnews.biz/2006/06/28/');
?>


thats it, nothing more.
It obviously is a server setting somehow. Probably in how PHP is compiled. I have no control over it of course though. So i wonder if there is a way override the encoding that PHP uses. im sure there is.

GAH!!!!!!!!!!!!!!!!!!!!

it is. LOL
I added:  <?php header('Content-Type: text/html; charset=utf-8'); ?>

apparently adding the meta version to the page doesnt work
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

though this only helps for display. The characters still get processed fucked up, and stored in the dB wrong Im guessing. But i will test further to make sure.

btw a page about it all is here: http://www.phpwact.org/php/i18n/charsets
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #32 on: November 27, 2007, 02:19:25 PM »

Ah, this makes some sense - the header that came back to you had that in it, but when you kick it back out to the original caller a new header is being created - so that's why it's not encoding correctly on the receiving end.

Note that storing the HTML in a database will not change it - so long as you add that header on the way back out to the surfer it'll decode correctly.
Logged

If I can't be Mr. Root then I don't want to play.
nutballs
Administrator
Lifer
*****
Online Online

Posts: 3456


View Profile
« Reply #33 on: November 27, 2007, 02:24:44 PM »

yep. its correctly storing it in the DB, though phpadmin shows it bad as well. but as long as the header is set, UTF-8, from PHP when I spit it out, it works fine.

makes sense.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #34 on: November 27, 2007, 04:28:07 PM »

Now that's interesting - I have some pages with funkiness (pretty apos, pretty quotes etc) that look just fine in phpMyAdmin... I assume you're looking at a pretty recent version...
Logged

If I can't be Mr. Root then I don't want to play.
ratthing
Journeyman
***
Offline Offline

Posts: 75


View Profile
« Reply #35 on: November 28, 2007, 10:57:12 PM »

If you're using MySQL, check the db encoding as well.  It many cases it's defaulted to Latin-1, which results in the db collation being set to Latin-Swedish-ci by default on new dbs.  It's a fairly well-known problem that still hasn't been fixed in a lot of Linux distros.

And Perk, thanks for the WebRequest code. I picked up a PHP & MySQL book from the library and have been fiddling around some more.  Of course, I've also been cribbing code from various places.  We'll see if any of it sticks.  Smiley
=RT=
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #36 on: November 28, 2007, 11:23:21 PM »

No worries lad. I'm gonna post an update soon as well - cuppla new features...
Logged

If I can't be Mr. Root then I don't want to play.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Online Online

Posts: 636



View Profile
« Reply #37 on: November 29, 2007, 08:16:11 AM »

Perk, can you figure out how curl_multi works and make your class pseudo-multi-threaded?  Grin

Probably too much to ask, but I just started using curl_multi for guestbooks and trackbacks and it rocks faces (to understate how awesome it is).
Logged

perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #38 on: November 29, 2007, 08:54:08 AM »

'Twould simply be easier for you to create your own threads, and instantiate a new instance of the class in each and I think you'd be good to go. The only thing that would not be understandable would be debugMode=WRD_ECHO because the output lines would be intermixed... so you'd want to create a new file for each instance if you want to watch debug info. Other than that, since there's no shared memory or files, I think you'd be good to go UNLESS the fread() functions are not threadsafe, in which case you're just screwed Wink
Logged

If I can't be Mr. Root then I don't want to play.
meme
n00b
*
Offline Offline

Posts: 6


View Profile
« Reply #39 on: November 30, 2007, 09:48:56 AM »

I didnt look much at the class but I like the onSuccess,onFailure,before,after callbacks. I'm gonna implement that in my Curl class soon. Any ideas on how to pseudo-thread the callbacks so you process responses in batches?
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #40 on: November 30, 2007, 10:05:20 AM »

that confused me... the point of the event pops would be to handle the responses one-by-one as they come in, rather than in batches... could you flesh that out a bit more for me?
Logged

If I can't be Mr. Root then I don't want to play.
meme
n00b
*
Offline Offline

Posts: 6


View Profile
« Reply #41 on: November 30, 2007, 01:03:50 PM »

You might call them events but they don't fire off asynchronously. If my request pool has 10 items and my 'after' callback takes 5 min each, then instead of 50min it would take 5 min and 10 parallel processes/threads/forks etc. Now I know you don't have a request pool in your class, but I do (and you could too with socket_select()).

By the way, your cookie parser will break when there's no ';' to explode, check mine below ($str is the contents of the set-cookie header):
Code:
protected function parseCookie($str) {
if( strpos($str, ';') === false) {
$c = explode('=',$str);
$parts['name'] = trim($c[0]);
$parts['value']= trim($c[1]);
} else {
$cookiesplit = explode( ';', $str );
$parts = array();

foreach( $cookiesplit as $data ) {
$c = explode( '=', $data );
$c[0] = trim( $c[0] );

if( in_array( $c[0], array( 'domain', 'expires', 'path', 'secure', 'comment' ) ) ) {
switch($c[0]) {
case 'expires':
$c[1] = strtotime( $c[1] );
break;
case 'secure':
$c[1] = true;
break;
}
$parts[$c[0]] = $c[1];
} else {
$parts['name'] = $c[0];
$parts['value']= $c[1];
}
}
}

if( !empty($parts['name']) ) {
return array($parts['name'],$parts['value']);
} else {
return false;
}

}
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5230


:sniffle: Humor was so much easier before.


View Profile
« Reply #42 on: January 04, 2008, 11:27:01 PM »

Latest Update: Fixed some bugs in the onSuccess and onFailure event pops.

Enjoy!

Code:
<?php

class webRequest2
{
private $socket;

protected $finalURL;
protected $rawContent;
protected $rawHeader;
protected $rawResponse;

protected $chunkedLength;
protected $chunkedTransfer;
protected $cookies;
protected $cookieStr;
protected $errorFlag;
protected $getList;
protected $headers;
protected $postList;
protected $postStr;

public $accept;
public $charSet;
public $domain;
public $debugLogFile;
public $debugLogClearOnDispatch;
public $debugMode;
public $language;
public $manualPostContent;
public $method;
public $port;
public $postMode;
public $proxy;
public $redirect;
public $resultCode;
public $timeout;
public $url;
public $userAgent;
public $useSSL;

// Event Handlers
public $onFailure;
public $onProxyRetry;
public $onSuccess;

// Protected and special functions
function webRequest2()
{
$this->reset();
preg_match('/^([0-9])/'phpversion(), $parts);
$this->ancient = ($parts[1] < '5');
$this->userAgent 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8';
$this->accept 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$this->charSet 'ISO-8859-1,utf-8:q=0.7,*;q=0.7';
$this->language 'en-us,en;q=0.5';

if (!defined('WRD_OFF'))
{
define('WRD_OFF'0);
define('WRD_ECHO'1);
define('WRD_LOG'2);

define('WRM_GET'0);
define('WRM_POST'1);

define('WRP_NORMAL'0);
define('WRP_MULTIPART'1);
}

$this->debugMode WRD_OFF;
$this->postMode WRP_NORMAL;
$this->debugLogFile '';
$this->debugLogClearOnDispatch true;
$this->timeout 30;
$this->useSSL false;
$this->proxy '';
}

protected function buildCookieStr()
{
$cookieStr '';
$start true;
foreach($this->cookies as $name=>$value)
{
if (!$start) { $cookieStr .= '; '; }
$cookieStr .= "$name=$value";
$start false;
}
$this->debug("Built COOKIE String: $cookieStr");
return $cookieStr;
}

protected function buildGetStr()
{
$getStr '';
$getCount count($this->getList);
if ($getCount)
{
$sepStr '?';
foreach($this->getList as $name=>$value)
{
$value urlencode($value);
$getStr .= "$sepStr$name=$value";
$sepStr '&';
}
}
$this->debug("Built GET String: $getStr");
return $getStr;
}

protected function buildPostStr()
{
if ($this->manualPostContent)
return $this->manualPostContent;

$postStr '';
$postCount count($this->postList);
if ($postCount)
{
$sepStr '';
foreach($this->postList as $name=>$arr)
{
$value urlencode($arr['content']);
$postStr .= "$sepStr$name=$value";
$sepStr '&';
}
} else {
$postStr 'No Content';
}
$this->debug("Built POST String: $cookieStr");
return $postStr;
}

protected function buildHeader()
{

$header[0] = ''// place holder for first line of header
$header[] = "Host: {$this->domain}";
$header[] = "User-Agent: {$this->userAgent}";
$header[] = "Accept: {$this->accept}";
$header[] = "Accept-Language: {$this->language}";
$header[] = "Accept-Encoding: ";
$header[] = "Accept-Charset: {$this->charSet}";
if ($this->hasCookies()) { $header[] = "Cookie: {$this->buildCookieStr()}"; }
$header[] = "Connection: close";

$hostStr = ($this->proxy) ? "http://{$this->domain}" '';
switch($this->method)
{
case 'get':
case 'GET':
$header[0] = "GET $hostStr{$this->finalURL} HTTP/1.1";
$header[] = '';
$header[] = "Content-Type: text/html";
$header[] = "Content-Length: 0";
$header[] = '';
break;

case 'post':
case 'POST':
if (count($this->postList) == 0$this->postMode WRP_NORMAL;

$header[0] = "POST $hostStr{$this->finalURL} HTTP/1.1";
switch ($this->postMode)
{
case WRP_NORMAL:
$postData $this->buildPostStr();
$requestLen strlen($postData);
$header[] = "Content-Type: application/x-www-form-urlencoded";
$header[] = "Content-Length: $requestLen";
$header[] = '';
$header[] = $postData;
break;

case WRP_MULTIPART:
$boundary time() . time();
$postData $this->buildMultipartPostStr($boundary);
$requestLen strlen($postData);
$header[] = "Content-Type: multipart/form-data; boundary=$boundary";
$header[] = "Content-Length: $requestLen";
$header[] = '';
$header[] = "$postData";
break;

default:
$this->debug("buildHeader: Terminal failure - unknown postMode '{$this->postMode}'");