The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 23, 2019, 12:45:29 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Apache rewrite to "mount" remote url on a folder  (Read 3485 times)
netmktg
Rookie
**
Offline Offline

Posts: 37



View Profile
« on: October 21, 2010, 10:58:00 PM »

There's no Apache forum, but Php seemed the best fit for a Apache query.

I wanted to "mount" the http://remotedomain.com/img  on http://thisdomain.com/images

In IIS, remote locations can be "mounted" on a virtual directory. Is it possible todo that in .htaccess WITHOUT sending a 302 redirect to the browser?

The following .htaccess works fine in the "/images" folder. It displays the images but also sends a 302 to the browser...

RewriteEngine on
RewriteRule (.+?)\.(gif|jpg|png)$ http://remotedomain.com/img/$1.$2 [L]

Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: October 21, 2010, 11:15:13 PM »

I don't believe so. With that rule you are telling the browser that the images are not (here) they are (there) thus the 302 - and without it, the browser would not know how to go get them. Although you are asking for a technical solution, I think your intentions can be handled by modifying the method a bit.

Couple thoughts:

Can you rewrite the outgoing HTML so that it points to the correct graphic location in the first place? Since you're doing a simple bounce here and not tracking it then I can't see a reason why not (unless you are not in control of the HTML) This is the cleanest way to do it, unless this is an HTTPS page, in which case you'll get a "Portions of this page are unsecured" error. If your HTML is single-entry-single-exit PHP code then a simple str_replace() or preg_replace() would do the job quite nicely.

You could rewrite to a PHP script that looks locally for the image, and if it's not here then *IT* goes and gets it, returning it to the user. Sort of a poor man's cached proxy.


Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
netmktg
Rookie
**
Offline Offline

Posts: 37



View Profile
« Reply #2 on: October 22, 2010, 04:55:12 AM »

I don't believe so. With that rule you are telling the browser that the images are not (here) they are (there) thus the 302 - and without it, the browser would not know how to go get them.

The 302 doesn't happen for local urls e.g. friendly urls in Forums, Blogs etc. I use url rewriting on all of my sites, it doesn't do a 302 when the target is Local. But Rewriting to remote urls is treated differently... which I've just learnt.


Can you rewrite the outgoing HTML so that it points to the correct graphic location in the first place? Since you're doing a simple bounce here and not tracking it then I can't see a reason why not (unless you are not in control of the HTML)

Yea, they are all my sites, including the target site. But I want to completely "separate" the sites without having to upload the same image folder for every site. In fact, image urls are read from a db and display from a central location for all sites. My intention was to have the image locations "appear distinct" without duplicating the image folder.


You could rewrite to a PHP script that looks locally for the image, and if it's not here then *IT* goes and gets it, returning it to the user. Sort of a poor man's cached proxy.

Heh... this may sound odd, but I had already done that before my original post. But I still wanted to see if there's a way todo it purely in .htaccess without php. The actual php I use sanitizes input and uses a Switch-case for different image files, but the operational part is 2 lines...

header('Content-type: image/' . $imgtype);
print file_get_contents('http://remotedomain.com/img/' . $img);


The .htaccess is...

RewriteEngine on
RewriteRule (.+?)\.(gif|jpg|png)$ index.php?img=$1.$2 [L]
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: October 22, 2010, 01:13:20 PM »

The 302 doesn't happen for local urls e.g. friendly urls in Forums, Blogs etc. I use url rewriting on all of my sites, it doesn't do a 302 when the target is Local. But Rewriting to remote urls is treated differently... which I've just learnt.
Correct. When you are rewriting to local, the caller (browser) has no idea - you're simply modifying how the call is passed through the Apache chain. But when you add http:// to the front, you are sending back a message (a 302) to tell the browser that the resource it needs is located elsewhere. You can make it a 301 as well with parameters on the end of the rule.


Yea, they are all my sites, including the target site. But I want to completely "separate" the sites without having to upload the same image folder for every site. In fact, image urls are read from a db and display from a central location for all sites. My intention was to have the image locations "appear distinct" without duplicating the image folder.
I think you may have misunderstood me.

I do this all the time and it rocks. I use it both for mini-mashups and to reduce load on the servers that create my HTML. Assuming you have a PHP script that dumps out all the html in one place, you can easily rewrite each call for an image to look to the correct location (your central repository) and then the server doesn't need to tell the browser where to get the resource. This will reduce your server overhead considerably and make pages pop much more quickly, because the browser will open multiple sockets to multiple locations.

Let's assume your HTML looks like this:
blah blah blah <img src="/images/test.jpg">

Again, at the very last moment before the HTML is shipped out to the caller, we do a quick replace:
Code:
// assume $content contains the entire HTML page set to go out:
$content = preg_replace('~<img src="/~', '<img src="http://www.myrepository.com/images/', $content);

Now the browser knows where to get each image without having to request it from you. WAY better on your bandwidth and perceived page speed.


header('Content-type: image/' . $imgtype);
print file_get_contents('http://remotedomain.com/img/' . $img);

I was thinking you might simply add something like:
Code:
if (! $buff = file_get_contents("/www/cache/$img"))
file_put_contents($buff = file_get_contents("http://remotedomain.com/img/$img"), "/www/cache/$img");

header('Content-type: image/' . $imgtype);
echo file_get_contents('http://remotedomain.com/img/' . $img);

... then you still get the benefit of the remote repository and caching the images locally. But I think the above method is better.

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
netmktg
Rookie
**
Offline Offline

Posts: 37



View Profile
« Reply #4 on: October 22, 2010, 01:51:55 PM »

Correct. When you are rewriting to local, the caller (browser) has no idea - you're simply modifying how the call is passed through the Apache chain. But when you add http:// to the front, you are sending back a message (a 302) to tell the browser that the resource it needs is located elsewhere.

I didn't know how it worked in theory, I saw the 302 header and knew I had learnt something new   Grin


I think you may have misunderstood me.

I do this all the time and it rocks. I use it both for mini-mashups and to reduce load on the servers that create my HTML. Assuming you have a PHP script that dumps out all the html in one place, you can easily rewrite each call for an image to look to the correct location (your central repository) and then the server doesn't need to tell the browser where to get the resource.

The misunderstanding has been compounded; maybe because I did not explain my objective. To start with, I already had the central repository image paths in my html - that was my problem, not the solution. My objective was to have a common repository while giving the appearance that each site has its own image repository.

I did NOT want the central repository to be visible to  Google and my Competitors; a central image repository just links all my (blackhat) sites together.

I have now replaced the central repository with the new .htaccess + php method.

Your suggestion on local file check & caching is a nice touch and I will be adding it to my code.

Thanks, of course. As usual, you are always helpful and insightful.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: October 22, 2010, 02:05:54 PM »

Now I gotcha ... Dint catch that it's a 'hat play. Makes total sense now.

Well if that's the case, the you might even consider abstracting the name of the graphic, so that there's nothing even remotely the same. I did this for a network I built once called the Intenational Press Release Archive Cooperative. Essentially I was scraping and representing press releases that were ... um, enriched, shall we say. Each page had a random structure generator (tables) and graphic names were encoded so that I knew what they were, but no two sites had the same structure or graphics. Worked quite well for awhile, but got bagged and I let it go. But some of those obfuscation techniques are still quite valid... Wink   
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
netmktg
Rookie
**
Offline Offline

Posts: 37



View Profile
« Reply #6 on: October 24, 2010, 12:54:23 AM »

Well if that's the case, the you might even consider abstracting the name of the graphic, so that there's nothing even remotely the same. I did this for a network I built once called the Intenational Press Release Archive Cooperative. Essentially I was scraping and representing press releases that were ... um, enriched, shall we say.

When I deployed the project, I had initially planned to cache all images with names being md5 hashes with a random number. But the cache was bloating even with just the html page being cached. So, I had to shelve image-caching... otherwise each site will require 10gb+ to host. Now, I manage in less than 3gb by pruning the cache on a daily schedule.

But this latest thing with Htaccess+Php does present a new approach, I could md5 hash the image names and save a hash->imageurl array, then I could lookup the imageurls and the Php could fetch and display the images. Only thing here would be processing overhead for every image call.
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: October 24, 2010, 01:30:17 AM »

Use memcached. Take that problem right off your list.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!