The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 11:14:47 PM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: Implement cloaking in wordpress  (Read 10753 times)
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« on: July 04, 2008, 06:34:06 AM »

Suggestion of new category: Tutorial or projects. Only if you want Perk 

My goal is to implement cloaking in wordpress and I'm going to split this upp into small tasks that
could be helpful for other members. I don't know yet how to do this or how this will turn out but
I hope you could help me speed things up if you want.

So I have been working on a tool for a couple of month now and what it does is that it finds link
exchange directories. Now google thinks that link exchange is baaaad. It's a no no. Google punish,
google crush.... So we need  to make google think it is only a one way link and we do that with
cloaking.

I want to be able to control which link should be cloaked and which should not. So I need to use
special tags that I can use. After some searching I found out that I need Quicktags. So I'm reading
about this right now and I will get back to you when i know how to code new quicktags.
Please add some comment if you know a simpler way.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: July 04, 2008, 10:27:44 AM »

There are two elements to this project: the decision on whether to cloak or not, and then the final output of the cloaked content.

In this case, WordPress is *really* going to mess with you because the code base is HTML with PHP Tags, rather than PHP outputting HTML, if you understand the difference. If not, here you go: The right way to code PHP looks like this:

Code:
<?php

$str 
= <<<HTML
<html><body>
Hello World
</body></html>
HTML;

echo 
$str;

?>


as opposed to the Word Press methodology (the files look like this):

Code:
<html><body>
<?php echo 'Hello World'?>
</body></html>

The problem is that in the second version, HTML is output *immediately* to the client and you have no chance to modify it. In the first version, you can modify the code to change how/when things are spit out - in the second you cannot. Ergo, in the first version you could change do str_replace on $str before it gets spit out to the surfer, in the second version you cannot. This, obviously, offers a particularly grueling challenge to cloak word press.

How do you modify a link when you have no idea when/where in the code it's being "spit out" at the user? If it's already gone, how do you modify it?

There are a couple options here, none of which are pretty:
First option: Huge and very custom mods to the WordPress code base. At first glance, I'd probably wrap all link outputs into an object method call, so that I could modify it. Consider this first example, then modified so that I can cloak the link:

Uncloaked example:

<table><tr>
<td></td>
</tr></table>


Cloakable example:

<table><tr>
<td><?php $cloaker->translate(''); ?></td>
</tr></table>

... then I'd have a class definition and an object called "cloaker" that decided, based on the IP address, if I should return the URL as passed in, or use a REGEX to convert it to my cloaked version. Obviously you'd need to get into the codebase for WordPress and see where it is appropriate to define this class and instantiate the object.

Second option: Man in the middle attack. This is a more "weighty" approach and may complicate other things like posts and such, but if you had WordPress answering on port 81, for example, and you had your man-in-the-middle server answering on port 80, then when a page request came into the site, your process would proxy the request into WordPress on port 81, then when it got all the HTML back, it would be one big string and you could have your way with it utterly. This solution would be very powerful since you could change anything about WordPress in the future (your theme, capabilities, plugins etc) and none of your modifications would be affected since they exist in a layer completely separate from Word Press. I'll not describe the Apache VirtualHosts and such that you'd need here unless you find that you like this option and need help. But consider this code (I'm using my webRequest class here for example - I think cURL would be far better but this is fine for an objective example):

Code:
<?php

$req 
= new webRequest2();
preg_match('/(^(*.\.com)(.*)$/iU'), $_SERVER['REQUEST_URI'], $parts);
$newURL "{$parts[1]}:81{$parts[2]}";
$pagebuff $req->simpleGet($newURL);
$pageBuff str_replace('<a href="http://anothersite.com/">''<a href="http://anothersite.com" rel="nofollow">'$pageBuff);
echo 
$pageBuff;
?>


This is a rather silly example and would need (obviously) to be considerably more complicated in the search/replace phase, but I hope you get the idea.

This problem is probably why people aren't giving you much joy @ syndk8 on this issue Natt... it's not pretty but can be done. Depends on your stomach and will for it. Doesn't scare me one bit, but I'd really need to have a strong reason to go that deep.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #2 on: July 05, 2008, 04:40:42 PM »

Thanks perk.
It wasn't so complicated once I found where the output is handled.
I knew there already was support for quicktags but most of them was handled in javascript.
One tag though named <!--more --> was handled in the php core.

Now the thing that took time was all the preg_match and preg_replace patterns that I had to work on.
Still don't know if there will be a problem with them but it works so far. So the regular expression should
have a thorough check by an expert.

Now the only thing I have left is the cloak by ip code and that I will implement later and that is the easy part.
I have done the hard part and here it comes...:

1. In your wordpress installation go to the 'wp-includes' folder
2. Edit post-template.php
3. Scroll down to 
Code:
function get_the_content($more_link_text = '(more...)', $stripteaser = 0, $more_file = '') {

4. Find this line
Code:
$content = $pages[$page-1];

5. Insert nattsurfarens super extreme code directly below the line on step 4
Code:
$cloak=true;
if ( preg_match_all('/<!--cloak(.*?)-->/s', $content, $matches) )
{
$result="";
if($cloak)
{
$result=preg_replace('/rel\s*=\s*"[^"]*"/s',"",$matches[1][0]);//Note no support for multiple cloaks. Don't think it's nessesary?
$result=preg_replace("/rel\s*=\s*'[^']*'/s","",$result);
$result=preg_replace("/(<a\b)([^>]*>.*?<\/a>)/s","\$1 rel='nofollow'\$2",$result);//First parantese = \$1 and second \$2. Insert between.
}

$content=preg_replace("/<!--cloak.*?-->/s",$result,$content);

}

$cloak is a simple boolean (true/false) variable and should be triggered only when spider is visiting. So here we will add the cloaking engine.
So this is what I'm about to work on right now.
Oh I forgott  Shocked
To use this:

Edit any blog post.
Add:
Code:
<!--cloak
<a href="www.perkiset.org">Perkiset place</a>
-->

Always do this when you do link exchange with perk.  Smooch

Enjoy!

« Last Edit: July 05, 2008, 04:49:47 PM by nattsurfaren » Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: July 05, 2008, 06:17:53 PM »

Well done NS - an elegant solution.

And I'm not sure I understand your smooch, but is it that all links normally pointing at your recips will point to my domain if a spider visits? Sweet!  Praise


Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #4 on: July 06, 2008, 03:11:29 AM »

Quote
but is it that all links normally pointing at your recips will point to my domain if a spider visits? Sweet!  Praise
Only in your dreams Perk.  Grin

Couldn't sleep. There is something missing. What happends if I don't want to cloak.
I have not tested this yet but I think it will work.

Code:

$cloak=true;
if ( preg_match_all('/<!--cloak(.*?)-->/s', $content, $matches) )
{
$result="";
$result=preg_replace('/rel\s*=\s*"[^"]*"/s',"",$matches[1][0]);//Note no support for multiple cloaks. Don't think it's nessesary?
$result=preg_replace("/rel\s*=\s*'[^']*'/s","",$result);
if($cloak)
{
$result=preg_replace("/(<a\b)([^>]*>.*?<\/a>)/s","\$1 rel='nofollow'\$2",$result);//First parantese = \$1 and second \$2. Insert between.
}

$content=preg_replace("/<!--cloak.*?-->/s",$result,$content);

}
« Last Edit: July 06, 2008, 03:26:04 AM by nattsurfaren » Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: July 06, 2008, 10:38:12 AM »

At the end of the day, you simply want no indication of what you were up to at all in the delivered HTML. If you cloak only well established spiders (like the spiderSpy DB) and everyone else sees the links as if they are passing juice, you are most safe.

I'd see about adding a cache buster to your pages as well (if you allow Google to cache you) so that if people do look at the cache they'll bump to your site and your cloak will be maintained (unless the hunter turns JS off and then looks at the cache).
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #6 on: July 07, 2008, 07:29:51 PM »

Thanks Perk. I will try my cloaking hack first and if I'm caught cloaking often by hunters I will look into the cache buster trick you recommended.

I have worked on the cloaking engine found on syndk8:
http://syndk8.net/forum/index.php/topic,7864.0.html

I had to fix a cople of bugs but now it seams to be working.
Please let me know if there is something missing or should be changed.
Read comments explaining code changes in files.

How to install:

1. In your wordpress installation go to the 'wp-includes' folder
2. Insert file "ip-cloak.php" with this content in this directory.

Code:
<?

// IP CLOAK 1.0 ////////////////////////////////////////////////////////
// WORKS USING USERAGENT, REFERER, AND IP
// CHECKS CLASS C IP, NOT ACTUAL IP WHEN DOING IP DETECTION!
// USE AT YOUR OWN RISK
////////////////////////////////////////////////////////////////////////

// IP UPDATER INCLUDE //////////////////////////////////////////////////
// THIS CHECKS TO SEE IF YOU HAVE THE LATEST IP LISTS AVAILABLE. PLEASE
// MAKE SURE THAT YOU HAVE THE ip-update.php FILE IN THE SAME DIRECTORY
// AND GIVE THE GUY A DONATION OVER AT iplists.com
////////////////////////////////////////////////////////////////////////

//Note that I'm using wordpress ABSPATH and WPINC.
//ip-update called if ips.txt is missing.
if(!is_file(ABSPATH."/ips.txt"))
{
include (ABSPATH . WPINC . '/ip-update.php');
}

$timestamp = filemtime(ABSPATH."/ips.txt");
$lastupdated = date("Ymd", $timestamp);
if ($lastupdated != date("Ymd"))
{
    $server = "http://" . $_SERVER['SERVER_NAME'];
    include (ABSPATH . WPINC . '/ip-update.php');
}

// GRAB SOME VALUABLE INFORMATION //////////////////////////////////////

$ip = $_SERVER["REMOTE_ADDR"];
$ref = $_SERVER['HTTP_REFERER'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$host = strtolower(gethostbyaddr($ip));

// This code is removed. I have no clue why first convert to array and then implode to a string?
//$file = implode(" ", file(ABSPATH."/ips.txt")); //Old code

//Code is read to a simple string.
$fhandle = fopen(ABSPATH."/ips.txt","r");
$file = fread($fhandle,filesize(ABSPATH."/ips.txt"));
fclose($fhandle);

$exp = explode(".", $ip);

//Modified to remove the ending dot. (I think this fixes alot of problems)
$class = $exp[0] . '.' . $exp[1] . '.' . $exp[2];

// CLOAKING THRESHOLD //////////////////////////////////////////////////
// 0 = ALWAYS CLOAK
// 1 = OFTEN CLOAK
// 2 = SOMETIMES CLOAK
// 3 = RARELY CLOAK
// 4 = NEVER CLOAK
////////////////////////////////////////////////////////////////////////

$threshold = 1;

// PERFORM CLOAK CHECKS ////////////////////////////////////////////////
//echo "<!--\r\n-->host = $host <br> class = $class <br> agent= $agent <br>referer=$ref";

// && is changed to || as recommended in syndk8. Otherwise $host needs to have all three at the same time. When does that happend?
if (stristr($host, "googlebot") || stristr($host, "inktomi") || stristr($host,
    "msn"))
{
    $cloak++;
}

//Modified to match the beginning of a row. Otherwise both beginning and end can match.
if (stristr($file, "\r\n".$class))
{
    $cloak++;
}

if (stristr($file, $agent))
{
    $cloak++;
}

if (strlen($ref) > 0)
{
    $cloak = 0;
}

// PERFORM CLOAK DATA ANALYSIS /////////////////////////////////////////

if ($cloak >= $threshold)
{
    $cloakdirective = 1;
}

else
{
    $cloakdirective = 0;
}

?>

3. Insert file "ip-update.php" with this content:
Code:
<?

// IP UPDATER 1.0 ////////////////////////////////////////////////////////
// YOU MUST GET PERMISSION FROM THE GOOD FOLKS AT IPLISTS.COM
// IF YOU INTEND ON USING THIS IP UPDATER SCRIPT WITH ANY REGULARITY
// BUY THE MAN A DRINK: http://www.iplists.com/bmad/
//
// MAKE SURE THE ips.txt FILE IS WRITABLE
//////////////////////////////////////////////////////////////////////////

include (ABSPATH . WPINC . '/Snoopy.class.php');
$lists = array(
'http://www.iplists.com/google.txt',
'http://www.iplists.com/inktomi.txt',
'http://www.iplists.com/lycos.txt',
'http://www.iplists.com/infoseek.txt',
'http://www.iplists.com/altavista.txt',
'http://www.iplists.com/excite.txt',
'http://www.iplists.com/northernlight.txt',
'http://www.iplists.com/misc.txt'
);

foreach($lists as $list) {

$snoopy = new Snoopy;
$snoopy->agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4";
$snoopy->fetch($list);
$opt .= $snoopy->results;
$rand = rand(2,7);
sleep($rand);
}
$date = date("Y-m-d");


//This is fix problem like non matching newline. Some systems use \r other \n and windows use \r\n. Now it is always \r\n
$opt = preg_replace("/\r\n|\n|\r/s", "\r\n", $opt);

//Removing spaces after newline.
$opt = preg_replace("/\r\n\\s*?/s", "\r\n", $opt);




$fp =  fopen(ABSPATH."/ips.txt","w");
fwrite($fp,$opt);
fclose($fp);

?>

3. Edit file "post-template.php" and change nattsurfarens super extreme code
Code:
//$cloak=true;
if ( preg_match_all('/<!--cloak(.*?)-->/s', $content, $matches) )
{
//Global necessary as mention in syndk8
global $cloakdirective; //global needed
$result="";
$result=preg_replace('/rel\s*=\s*"[^"]*"/s',"",$matches[1][0]);//Note no support for multiple cloaks. Don't think it's nessesary?
$result=preg_replace("/rel\s*=\s*'[^']*'/s","",$result);
if($cloakdirective)
{
$result=preg_replace("/(<a\b)([^>]*>.*?<\/a>)/s","\$1 rel='nofollow'\$2",$result);//First parantese = \$1 and second \$2. Insert between.
}

$content=preg_replace("/<!--cloak.*?-->/s",$result,$content);

}


Make sure you upload Snoopy.class.php
Download from here:
http://sourceforge.net/projects/snoopy/

For your convenience ip-addresses are updated when ip file is missing.
« Last Edit: July 07, 2008, 07:42:15 PM by nattsurfaren » Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: July 07, 2008, 07:39:43 PM »

Well done Natt and nice share. Thanks mucho mang.

/perk
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #8 on: July 08, 2008, 07:17:25 PM »

@perk
about tag soup vs. html output by php - what do you think of templates engines like smarty (or others or even home rolled)?

i only recently discovered templates when i started using django and fell in love with feature like inheritance and the relatively clean code (compared to the tag soup).
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: July 08, 2008, 09:06:02 PM »

I think for some folks, frameworks like smarty can really help out. We commissioned a job by some Indians a ways back and that's exactly how they put the site together. It is effective in many instances, but in my case, I don't like any of them ROFLMAO so I hand roll my own.

Actually, the real reason is that I have not yet found a framework that is flexible enough to accommodate my IP delivery needs, dynamic page generation and URL translation needs and goes fast enough. Is there something specific you are looking to evaluate or accomplish? I'd probably have a better response if you tightened up your question a bit...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #10 on: July 09, 2008, 08:29:57 AM »


Just discovered a problem with wordpress and tinymce. For some reason tinymce add a mce_href attribute along with href. There are many posts found by google about this problem. Because I do link exchange and a spider visits my page and check that the html is exactly as described this can be a problem. To fix this you can add these two lines in my cloaker code snippet post_template.php .

Code:
$result=preg_replace('/mce_href\s*=\s*"[^"]*"/s',"",$result);
$result=preg_replace("/mce_href\s*=\s*'[^']*'/s","",$result);
Logged
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #11 on: July 09, 2008, 09:13:34 AM »

sorry to jack this thread nattsurfaren. Smiley

i just meant templating engines not a whole framework. i don't know much about smarty so maybe it was a bad example.
it seems similar to what you are doing.

with velocity (java) for example, you could do something like:
Code:
// i am just making this up i forget the syntax
vars["someValue"] = someValue;
velocity.render(vars, "your-template.html");
and then your html has place holders like {someValue} or simple display logic.

it seems like they do what what you are doing only moving the html back into it's own files.
i was just curious if you had evaluted any in php and found them wanting/inflexible whatever.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #12 on: July 09, 2008, 09:29:15 AM »

Ah... well, actually I use something very similar in the way I lay out a site.

Usually, I have a main.php in the root directory, then a theme directory which contains all of the "template" parts. The main.php will call a particular type of page generation routine (a simple include, or perhaps something really complex, it doesn't matter) which will generate the client-area content, then the theme files will have references to what needs to be spit out, something like this:

Code:
<?php

$header 
= <<<HTML
<html><body>$allTheClientContent</body></html>
HTML;

?>


... then the main.php is responsible for any final work and sending the content out. Final work can be things like modifying the keywords/title/description of a page etc. By simply "including" the header files and they reference known variables I get the same sort of templating notion with a lot of speed. I use include or require as much as I can because this also takes advantage of the APC cache I have running on all of my machines.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #13 on: July 09, 2008, 09:39:32 AM »

i see. so you are basically using a home rolled "template engine" of sorts, only blazing fast.
i got the impression you were outputting the html wherever the need be.
nice.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #14 on: July 09, 2008, 03:38:05 PM »

I spit HTML as almost the *very last thing* a script does.

Everything is added to an array named $content, then imploded, then I do any last minute str-replacements that I might to process (there's always a couple, particularly when I have some cache-busting or other to do...) then echod as a single string. This also gives me the chance to completely change what I spit out, or even if I do spit anything out all the way up to the last moment - completely the opposite of the WordPress codebase, for example, which starts spitting HTML immediately and as such, you can't decide to 301 the user after the first little bit of code.

And yes, it's damn fast, because the code (and therefore most ofthe HTML) is precompiled into APC, then variables are dereferenced when I do a simple $content[] = <<< sort of thing.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!