The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 11:05:04 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: .htaccess redirection problem  (Read 4398 times)
Docthorn
Rookie
**
Offline Offline

Posts: 10


View Profile
« on: August 09, 2008, 06:26:05 AM »

Hi all,

so I have this little problem my htaccess file and can' figure out how to fix it.

This is the file:
Code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
RewriteCond %{HTTP_HOST} ^mysite.com
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^.*/index.html
RewriteRule ^(.*)index.html$ http://www.mysite.com/$1 [R=301,L]

The first piece should remove extension to all html files.
The second piece should redirect the non-www to the www version of the site.
The third piece should redirect index.html to site root.

Now I have this problem.
All the internal links look like www.mysite.com/page-one  (no extension) but I can still access mysite.com/page-one.html and I don't get redirected to the extension-less URL. This is going to cause me problems since both versions of the page are accessible.

Second, if I try to access http://mysite.com/any-page I get redirected to the www version BUT it adds the html extension at the end of the URL.
So if I go to http://mysite.com/any-page I get redirected to http://www.mysite.com/any-page.html
No idea why.

I can I fix it?

Thanks in advance,
Doc
« Last Edit: August 09, 2008, 06:40:41 AM by Docthorn » Logged

No links in signatures please
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #1 on: August 09, 2008, 07:21:00 AM »

hi doc,

This "piece" RewriteCond %{REQUEST_FILENAME}\.html -f needs a space before the \


Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
Docthorn
Rookie
**
Offline Offline

Posts: 10


View Profile
« Reply #2 on: August 09, 2008, 08:41:53 AM »

Hi bompa,

I've added a space there and if I try to access the site I get this error

Quote
Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, webmaster@mysite.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.

Logged

No links in signatures please
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #3 on: August 09, 2008, 04:15:07 PM »

Take the space out and stop listening to me.

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: August 09, 2008, 04:23:49 PM »

Can you try this for your first RewriteRule, the one that should
remove the .html?

RewriteRule ^(.*)html$ $1

it may also need a \. in front of html, i can never remember that shit





Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: August 10, 2008, 11:46:03 AM »

Code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

The first piece should remove extension to all html files.

Doc - I don't understand at all how that chunk will "remove the extension" to all html files... if I read this correctly, this says:
"If the requested filename is not a directory AND
if the requested file name plus the extension ".html" is a file THEN
Rewrite the ENTIRE URI into the entire uri plus .html"

I see this as adding an extension if it doesn't exist.


Code:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mysite.com
RewriteRule (.*) http:// www. mysite.com/$1 [R=301,L]

Personally, I'd put this as the first evaluation, not the second.
And on another note, I don't like this form, because it is too ambiguous. I'm assuming that, at the very least, you know what domains are hitting your machine, yes? So in fact you know that *mydomain.com should be at this box? Then I'd add a virtual host that captures and bounces everything in front of the real site like this:

<VirtualHost 1.2.3.4:80>
   ServerName mydomain.com
   ServerAlias *.mydomain.com
   DocumentRoot /www/somedir
   RewriteEngine on
   RewriteRule ^(.*)$ http: //www. mydomain.com$1 [R=301]
</VirtualHost>

This will take everything as is from the first (incorrect) domain and bounce it into your for reals site. I think part of the problem is that you are doing your .html addition BEFORE you are evaluating an incorrect domain... which is why I'd get it out of the way first and out of the flow of the real domain. Additionally, why impose the extra cycles on EVERY SINGLE WEB CALL (including graphics) to evaluate if the domain is wrong? Bounce folks to the right domain in the first place and don't evaluate again.

Code:
RewriteEngine on
RewriteCond %{THE_REQUEST} ^.*/index.html
RewriteRule ^(.*)index.html$ http://www.mysite.com/$1 [R=301,L]

The third piece should redirect index.html to site root.
This last piece is even more curious to me...
"if the entire request contains (but doesn't necessarily end with) /index.html THEN
Bounce the surfer to the the same domain and append the same URL and call it a 301... What the heck is your intention here? Why do you even need to do this? Is it that you might have /dir/dir/dir/index.html and you want to send people to /index.html? In that case, I'd do something more like this:

RewriteCond %{REQUEST_URI} /.*/index.html$
RewriteRule ^(.*)$  /index.html

and call it a day. Perhaps I'm missing something here - if you could clarify your intentions, or write out a set of rules so they could be evaluated into rewrite rules that'd be a lot easier than trying to just debug your existing code...

note that I put spaces in the above example URLs so that the forum would not make them links
« Last Edit: August 10, 2008, 11:48:08 AM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Docthorn
Rookie
**
Offline Offline

Posts: 10


View Profile
« Reply #6 on: August 10, 2008, 02:46:07 PM »

perks,

first, thanks for your detailed reply.
You make me cry, you are too advanced for me  Praise

Let me clarify things.

Basically I didn't know how to do the stuff I needed so opened mod_rewrite documentation and started reading. Being a tech noob I didn't understand most of the stuff I just googled a couple hours and copied and mixed codes from blogs, forums and other sites. That's why it's a mess. lol

My first intention was to prevent duplicate content penalties. I don't want two version of the same page, that's why I wanted to 301 redirect all non-www request to the www version so that only one version is accessible.
If someone, being a SE bot or a person, accesses http://site.com/ it gets redirected to http://www.site.com/
If someone accesses http://site.com/page-one it gets redirected to http://www.site.com/page-one

My second intention was to structure the site in a way that even if now it's a static manually coded website I can put a CMS on it later without incurring in any problem and of course not losing rankings because of moved pages. The CMS would be Drupal. This is a example of a drupal powered website: http://www.swampfox.ws/
Notice all the URLs are site.com/page  (without the html or php extension at the end).
I need to do that. Remove all the extension so that later I can install drupal on the site without moving and redirecting anything.

Third, I wanted to redirec site.com/index.html to root site.com/.

How would a good htaccess file would look like?

Sorry for my english, sometimes it's hard for me to explain things clearly  Embarrassed

Thank you perk and Bompa.

Ciao,

Doc
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: August 10, 2008, 03:43:18 PM »

You make me cry, you are too advanced for me  Praise
LOL nw Doc. Thanks tho.


Basically I didn't know how to do the stuff I needed so opened mod_rewrite documentation and started reading. Being a tech noob I didn't understand most of the stuff I just googled a couple hours and copied and mixed codes from blogs, forums and other sites. That's why it's a mess. lol
That explains quite a bit man... from your first post it looked like you knew what you were doing, so I went with it.


My first intention was to prevent duplicate content penalties. I don't want two version of the same page, that's why I wanted to 301 redirect all non-www request to the www version so that only one version is accessible.
If someone, being a SE bot or a person, accesses http://site.com/ it gets redirected to http://www.site.com/
If someone accesses http://site.com/page-one it gets redirected to http://www.site.com/page-one
Question 1: This one is the easiest. Are you on your own box, or hosting somewhere? The reason I ask is that you'd behave quite differently if you have access to the httpd.conf. This is the primary config file for Apache. If not, there are a couple ways we can do it. We'll tackle it the easy way.

In your .htaccess, you'll want something like this:
Code:
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$
RewriteRule ^(.*)$ http://www.mydomain.com$1 [R=301,L]

Note that this is really close to your example, however you want to put that very first in the list. The reason, is that you don't want to be evaluating anything about the URL if it's not even at the right domain yet. This rewrite will automatically place the requested URI (whatever it was) onto the correct domain for you as well.

Question 2: May I suggest, that if you're going to go to no-suffix file names in the future, then just rename your files right now? I mean, store them as filename instead of filename.html and presto, you're golden. This eliminates a whole bunch of hooie you're trying to accomplish here and will be faster and cleaner to boot. Just sayin.

Question 3: This is a bit redundant and weird. If you redirect /index.html to / then when the request comes back, Apache will still look for /index.html... it's just the way things work. The whole "duplicate penalty" thing is pretty limited here (and is arguably a myth in any case) so I'm not sure why you want to add this complexity.

Good luck!
/p
« Last Edit: August 10, 2008, 03:44:52 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Docthorn
Rookie
**
Offline Offline

Posts: 10


View Profile
« Reply #8 on: August 10, 2008, 04:29:03 PM »

Thanks again  Smiley

Replying..

Are you on your own box, or hosting somewhere?

Hosting somewhere.

In your .htaccess, you'll want something like this:
Code:
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$
RewriteRule ^(.*)$ http://www.mydomain.com$1 [R=301,L]

Note that this is really close to your example, however you want to put that very first in the list. The reason, is that you don't want to be evaluating anything about the URL if it's not even at the right domain yet. This rewrite will automatically place the requested URI (whatever it was) onto the correct domain for you as well.

Ok I changed my file and moved it in the first lines. Seems to be working better now. Thanks.

Question 2: May I suggest, that if you're going to go to no-suffix file names in the future, then just rename your files right now? I mean, store them as filename instead of filename.html and presto, you're golden. This eliminates a whole bunch of hooie you're trying to accomplish here and will be faster and cleaner to boot. Just sayin.

Sounds better but how do I do it safely? If I just rename the files (using my FTP software) from file.html to file when I access it with the browser the output is just html code. Maybe I'm missing something?
I found how to remove extensions here http://spindrop.us/2006/07/26/how-to-remove-file-extensions-from-urls/ and in some other places (like webmasterworld).

Question 3: This is a bit redundant and weird. If you redirect /index.html to / then when the request comes back, Apache will still look for /index.html... it's just the way things work. The whole "duplicate penalty" thing is pretty limited here (and is arguably a myth in any case) so I'm not sure why you want to add this complexity.

Honestly? No idea  Tongue
I'll leave it out then.

Logged

No links in signatures please
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!