Thread: rewrite maps

SEO

 idiot

Perk was helping me out with a particular task where i needed flexibility and power in rewriting data across a large number of urls and we talked about rewrite maps - great info and i hope he can expand here for all to see. It looks to me like what you started to explain had huge possibilities for lots of the things we all do...

dirk

Some time ago (about 4 years) I wrote some articles about mod_rewrite.

Below I have attached the part which explains the handling of the rewrite maps.

----------------------------------------------------

Another directive which is very handy for cloaking
purposes are the so-called Rewriting Maps. These are
files consisting of key/value pairs, e.g. in the
simple format of an ordinary text file:

cde2c920.infoseek.com spider
205.226.201.32 spider
cde2c923.infoseek.com spider
205.226.201.35 spider
cde2c981.infoseek.com spider
205.226.201.129 spider
cde2cb23.infoseek.com spider
205.226.203.35 spider

These keys are, as you can see, hostnames or IPs.
In this simplistic example the value is always the
same, namely "spider".

This directive is entered either in the server
section 2 or in the virtual host section 3 in file
"httpd.conf":

RewriteMap botBase txt:/www/yourdomain/spiderspy.txt

The Rewriting Map will then be available across your
server.

The other directives are entered in file ".htaccess":

RewriteCond  ${botBase:%{REMOTE_HOST}} =spider [OR]
RewriteCond  ${botBase:%{REMOTE_ADDR}} =spider
RewriteRule  ^(.*).htm$  $1.htm <>
RewriteRule  ^.*.htm$  index.html <>

The conditions will make the system check whether the
required access is generated by a spider. To this
effect a lookup of file "spiderspy.txt" is triggered.

If the key is found, the value "spider" is returned
and the condition is rendered as true.

Next, the first RewriteRule will be executed. This one
determines that the called for ".htm" page will be fed
to the spider. The variable $1 is equal to the part in
parentheses of "^(.*).htm$", i.e. the file name will
remain the same.

If the URL is called by a normal human visitor, rule 2
applies: the user will be redirected to page
"index.html".

As the ".htm" pages will only be read by spiders, they
can be optimized accordingly for the search engines.

You may also use a file in dbm format instead of an
ordinary text file. The binary data base format helps
accelerate the lookup which is particularly important
if you are operating from very large spider lists.

This example given above offers a simple cloaking
functionality. All ordinary visitors will always be
redirected to the site's "index.html" page and there
is no access logging beyond the mod_rewrite logs.

However, it does go to show how you can effectively
replace several lines of

Perl

  code with just a few
lines of mod_rewrite.

perkiset

Very nice article Dirk...
I had told

SEO

 I that you are really strong when it comes to this stuff, but I had forgotten reading this so many years ago - thanks for posting it.

Wasn't kidding about Dirk, was I? He's THE JUICE.

Yeah man... just call on me whenever you need the real info. I'm yer man  Applause

/p

m0nkeymafia

Top quality stuff, very nice way of doing it
Perhaps we need an

apache

  section too Applause


Perkiset's Place Home   Politics @ Perkiset's