
![]() |
hydra
Hey everyone.
Just thought I'd say hi - haven't posted for a while, been literally submerged (without a breath) with work/projects - not had time for anything fun like forums for what seems like forever! ![]() Hope everyone is well! As I accidentally posted in PHPcode repository last time thought I'd post to the correct section this time![]() What I'm working on -------------------- As you may/may not have read, my last few posts were regarding a 'middle man' PHPinterception system to display different code to the end users than what is on the server. Firstly, let me clear any concerns - I've not beein in jail/prison for the last few weeks - I'm using this system (at the moment) for ethical reasons. The main reason I was interested, was that I had to find a flexible solution to a problem that wouldn't budge (well spend any more money.) Why should my skills be hampered because a company doesn't want to spend any more money (and their developers are lame) on their website? I cut a deal along the lines of 'if your site does well i'll get x from the result' and that's whats hapenning. The pages are optimised, h1s/h2s, externalisingjavascriptetc all from my own server away from their code. This way I have control if they tell me to 'GTFO' or similar. ThePHPintercept system basically rewrites me - them - me - user. Yes poor performance, but hey like I'm going to give them access to my hard work(!) I picked a server physically close to their location for best performance. Working like a charm so far!Problems faced -------------- An 'application' which generates a flat html/ javascriptwebsite (some 1600 pages) which is reuploaded on a regular basis. Shopping cart, product text, main navigation etc all outputted injavascriptand basically rubbish from a search engines perspective. Not only was it written in 2002 with the most basic and unfriendly equivalent of FCK editor, it has it's own internal 'spaghetti' CSS system which is tied into practically every part and untangling it is ny on impossible. Also, the system is developed in German and I don't speak German! No real UK/US tech support so researching it on thenetis fairly pointless (at the moment.)Simple Solution -------------- "We'll build you a new system free and split profits?" Answer : "No way - been using this for 5 years and I want to keep it" Problems I still face ------------------ With these new URLs I'm proposing I'm crashing out at a fairly low level Old page = product.html?id=123 New page = product/123 How do I:- 301 redirect 'old pages' to 'new page' AND Rewrite all 'new pages' to 'old page' Basically 'redirecting to a rewrite' and my apacheis being caught in a loop. We have existing pages that need a 301 redirect. But want all new requests to be rewritten into a URL friendly format. The only way I've thought of is to do the old redirects via a redirect script, but was wondering if there was an easier way (got to be!)Also thought I'd point out that I'm really looking to contribute to the forum, have literally had no time in the last few weeks but will be back and offering my opinions and help as much as possible (to sum up, help will be repaid!!) Extra Questions --------------- I may be a little behind, but has scraping JS been conquered yet? Only ask because we had a blue-chip motor (ahem bmw/merc/audi/skoda) site to scrape and it was AJAXpowered.Does anyone have 'web screenshot' power. If so, pls contact me. Wikia Search - are you guys as worried as I am??!?!?!?!?!?!? Peace Hydra perkiset
quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 As you may/may not have read, my last few posts were regarding a 'middle man' PHPinterception system to display different code to the end users than what is on the server. Firstly, let me clear any concerns - I've not beein in jail/prison for the last few weeks - I'm using this system (at the moment) for ethical reasons.Glad to hear that Hydra - a MIM attack is always a little bit erm... interesting to discuss. I've never been involved in one before. Really. : ![]() quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 With these new URLs I'm proposing I'm crashing out at a fairly low level Old page = product.html?id=123 New page = product/123 How do I:- 301 redirect 'old pages' to 'new page' AND Rewrite all 'new pages' to 'old page' Your challenge sounds to me like a 2-phase process - IMO, doing it all in a single virtual host would get messy. I think you must capture all requests in one virtual host, and have requests going into this software on another virtual host. In the first virtual host, you pass everything into a phphandler. In this handler, look to see if the URL is the way you want it or not. If it is not, then rebuild the URL the way you want it, and using thephpheaders() function, send back a 301. If the url IS the way you want it, then rebuild it the way that <the original software> wants it, and throw an HTTP request to yourself, but to a different virtual host where <that software> is answering rather than you. The original software, now being very happy with the URL as you've thrown it, will answer correctly - but to YOU in thephpscript, not the surfer. You'll now have the HTML code that you can rewrite to your liking, then send that back to the surfer.That's it in a nutshell - sounds more complicated than it is. Or perhaps I just need another cup of coffee... quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 Also thought I'd point out that I'm really looking to contribute to the forum, have literally had no time in the last few weeks but will be back and offering my opinions and help as much as possible (to sum up, help will be repaid!!) No worries hydra. Nice to have you here. quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 I may be a little behind, but has scraping JS been conquered yet? Only ask because we had a blue-chip motor (ahem bmw/merc/audi/skoda) site to scrape and it was AJAXpowered.Do you mean building a scraper via JS? The jitko code did that quite effectively I believe... a bit scary, that... quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 Wikia Search - are you guys as worried as I am??!?!?!?!?!?!? Why so? What's your fear? Peacebackatcha, /p Bompa
quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447 ------------------ With these new URLs I'm proposing I'm crashing out at a fairly low level Old page = product.html?id=123 New page = product/123 How do I:- 301 redirect 'old pages' to 'new page' That looks straightforward. quote AND Rewrite all 'new pages' to 'old page' Huh? quote Basically 'redirecting to a rewrite' and my apacheis being caught in a loop.no shit, new page -> old page -> new page quote We have existing pages that need a 301 redirect. This should not be a major problem, ppl do this everyday. quote But want all new requests to be rewritten into a URL friendly format. Same. There's nothing unusual about wanting that feature. The only reason I am not posting htaccess code is that I have a haunting feeling that I am completely misunderstanding what you want to do. later, Bompa hydra
B & P (Bompa & Perkiset)
Basically had 1500 old urls listed in the search engines. Need these all 301ing to a nice tidy URL (not a problem) But these pages are hardcoded on the server, so I need the new tidy url to actually rewrite (apologies if this is the incorrect terminology(rewrite)) to the old pages (ie read them as the content) I've found a solution, and here's the code if anyone needs it for the future RewriteRule ^tidy/([^/]+)/?$ messy. php?param=$1 <>RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /messy. php?param=([^&]+) HTTP/RewriteRule ^messy. php$ http://localhost/tidy/%1/? [R=301,L]RewriteRule ^tidypage/?$ page.htm <> RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /page.htm HTTP/ RewriteRule ^page.htm$ http://localhost/tidypage/? [R=301,L] This does not get caught in a loop, and will redirect untidy versions to the tidy version (301) but the tidy versions will read the untidy versions content. Wikia Search - is going to be a fairly radical change to search engines (open source / contribution based) which means MFAs, spam sites etc will not be accepted ![]() Also, regarding automated screenshots of web pages, can anyone point me in the right direction? Cheers all --added-- Thanks perk for the offered solution of virtual hosts. Is a bit over my head at the moment (constantly testing/trying out new stuff - so will have a play over next few weeks) , I've taken a real 'caveman' approach with hardly any grace or finesse in the coding - I've got everything going through a PHPhandler as you say, and the handler rewriting tidy urls into the pages using the old strreplace looking up the old URLs in a mysql DB (which is synced to an MS Access DB![]() WTF!!??!?!? am I doing, should just be pumping out MFAs and putting my feet up ![]() hydra
quote author=perkiset link=topic=381.msg2500#msg2500 date=1183735434 Do you mean building a scraper via JS? The jitko code did that quite effectively I believe... a bit scary, that... I meant server side apps ( PHP ![]() javascriptcreated content (not physically in the source code of the page until created by JS)?perkiset
quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511 This does not get caught in a loop, and will redirect untidy versions to the tidy version (301) but the tidy versions will read the untidy versions content. A fine solution. Pretty efficient as well. quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511 Wikia Search - is going to be a fairly radical change to search engines (open source / contribution based) which means MFAs, spam sites etc will not be accepted ![]() quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511 Also, regarding automated screenshots of web pages, can anyone point me in the right direction? Nutballs and I spoke about this a long time ago... I'll see if I can dig up some of our discussions. It was not pretty, and most probably a Windoz based solution. I'll see what I can find. /p |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads