
![]() |
SEOidiot
Perk was helping me out with a particular task where i needed flexibility and power in rewriting data across a large number of urls and we talked about rewrite maps - great info and i hope he can expand here for all to see. It looks to me like what you started to explain had huge possibilities for lots of the things we all do...
dirk
Some time ago (about 4 years) I wrote some articles about mod_rewrite.
Below I have attached the part which explains the handling of the rewrite maps. ---------------------------------------------------- Another directive which is very handy for cloaking purposes are the so-called Rewriting Maps. These are files consisting of key/value pairs, e.g. in the simple format of an ordinary text file: cde2c920.infoseek.com spider 205.226.201.32 spider cde2c923.infoseek.com spider 205.226.201.35 spider cde2c981.infoseek.com spider 205.226.201.129 spider cde2cb23.infoseek.com spider 205.226.203.35 spider These keys are, as you can see, hostnames or IPs. In this simplistic example the value is always the same, namely "spider". This directive is entered either in the server section 2 or in the virtual host section 3 in file "httpd.conf": RewriteMap botBase txt:/www/yourdomain/spiderspy.txt The Rewriting Map will then be available across your server. The other directives are entered in file ".htaccess": RewriteCond ${botBase:%{REMOTE_HOST}} =spider [OR] RewriteCond ${botBase:%{REMOTE_ADDR}} =spider RewriteRule ^(.*).htm$ $1.htm <> RewriteRule ^.*.htm$ index.html <> The conditions will make the system check whether the required access is generated by a spider. To this effect a lookup of file "spiderspy.txt" is triggered. If the key is found, the value "spider" is returned and the condition is rendered as true. Next, the first RewriteRule will be executed. This one determines that the called for ".htm" page will be fed to the spider. The variable $1 is equal to the part in parentheses of "^(.*).htm$", i.e. the file name will remain the same. If the URL is called by a normal human visitor, rule 2 applies: the user will be redirected to page "index.html". As the ".htm" pages will only be read by spiders, they can be optimized accordingly for the search engines. You may also use a file in dbm format instead of an ordinary text file. The binary data base format helps accelerate the lookup which is particularly important if you are operating from very large spider lists. This example given above offers a simple cloaking functionality. All ordinary visitors will always be redirected to the site's "index.html" page and there is no access logging beyond the mod_rewrite logs. However, it does go to show how you can effectively replace several lines of Perlcode with just a fewlines of mod_rewrite. perkiset
Very nice article Dirk...
I had told SEOI that you are really strong when it comes to this stuff, but I had forgotten reading this so many years ago - thanks for posting it.Wasn't kidding about Dirk, was I? He's THE JUICE. Yeah man... just call on me whenever you need the real info. I'm yer man ![]() /p m0nkeymafia
Top quality stuff, very nice way of doing it
Perhaps we need an apachesection too![]() |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads