hydra

Hey everyone.

Just thought I'd say hi - haven't posted for a while, been literally submerged (without a breath) with work/projects - not had time for anything fun like forums for what seems like forever! Applause

Hope everyone is well! As I accidentally posted in

PHP

  code repository last time thought I'd post to the correct section this time Applause

What I'm working on
--------------------

As you may/may not have read, my last few posts were regarding a 'middle man'

PHP

  interception system to display different code to the end users than what is on the server. Firstly, let me clear any concerns - I've not beein in jail/prison for the last few weeks - I'm using this system (at the moment) for ethical reasons. The main reason I was interested, was that I had to find a flexible solution to a problem that wouldn't budge (well spend any more money.) Why should my skills be hampered because a company doesn't want to spend any more money (and their developers are lame) on their website? I cut a deal along the lines of 'if your site does well i'll get x from the result' and that's whats hapenning. The pages are optimised, h1s/h2s, externalising

javascript

  etc all from my own server away from their code. This way I have control if they tell me to 'GTFO' or similar. The

PHP

  intercept system basically rewrites me - them - me - user. Yes poor performance, but hey like I'm going to give them access to my hard work(!) I picked a server physically close to their location for best performance. Working like a charm so far!

Problems faced
--------------

An 'application' which generates a flat html/

javascript

  website (some 1600 pages) which is reuploaded on a regular basis. Shopping cart, product text, main navigation etc all outputted in

javascript

  and basically rubbish from a search engines perspective. Not only was it written in 2002 with the most basic and unfriendly equivalent of FCK editor, it has it's own internal 'spaghetti' CSS system which is tied into practically every part and untangling it is ny on impossible. Also, the system is developed in German and I don't speak German! No real UK/US tech support so researching it on the

net

  is fairly pointless (at the moment.)

Simple Solution
--------------
"We'll build you a new system free and split profits?"

Answer : "No way - been using this for 5 years and I want to keep it"

Problems I still face
------------------

With these new URLs I'm proposing I'm crashing out at a fairly low level

Old page = product.html?id=123
New page = product/123

How do I:-

301 redirect 'old pages' to 'new page'
AND
Rewrite all 'new pages' to 'old page'

Basically 'redirecting to a rewrite' and my

apache

  is being caught in a loop. We have existing pages that need a 301 redirect. But want all new requests to be rewritten into a URL friendly format. The only way I've thought of is to do the old redirects via a redirect script, but was wondering if there was an easier way (got to be!)

Also thought I'd point out that I'm really looking to contribute to the forum, have literally had no time in the last few weeks but will be back and offering my opinions and help as much as possible (to sum up, help will be repaid!!)

Extra Questions
---------------
I may be a little behind, but has scraping JS been conquered yet? Only ask because we had a blue-chip motor (ahem bmw/merc/audi/skoda) site to scrape and it was

AJAX

  powered.

Does anyone have 'web screenshot' power. If so, pls contact me.

Wikia Search - are you guys as worried as I am??!?!?!?!?!?!?

Peace

Hydra

perkiset

quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447

As you may/may not have read, my last few posts were regarding a 'middle man'

PHP

  interception system to display different code to the end users than what is on the server. Firstly, let me clear any concerns - I've not beein in jail/prison for the last few weeks - I'm using this system (at the moment) for ethical reasons.

Glad to hear that Hydra - a MIM attack is always a little bit erm... interesting to discuss. I've never been involved in one before. Really.  :Applause That's my story and I'm sticking to it.


quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447

With these new URLs I'm proposing I'm crashing out at a fairly low level

Old page = product.html?id=123
New page = product/123

How do I:-

301 redirect 'old pages' to 'new page'
AND
Rewrite all 'new pages' to 'old page'


Your challenge sounds to me like a 2-phase process - IMO, doing it all in a single virtual host would get messy.

I think you must capture all requests in one virtual host, and have requests going into this software on another virtual host. In the first virtual host, you pass everything into a

php

  handler. In this handler, look to see if the URL is the way you want it or not. If it is not, then rebuild the URL the way you want it, and using the

php

  headers() function, send back a 301. If the url IS the way you want it, then rebuild it the way that <the original software> wants it, and throw an HTTP request to yourself, but to a different virtual host where <that software> is answering rather than you. The original software, now being very happy with the URL as you've thrown it, will answer correctly - but to YOU in the

php

  script, not the surfer. You'll now have the HTML code that you can rewrite to your liking, then send that back to the surfer.

That's it in a nutshell - sounds more complicated than it is. Or perhaps I just need another cup of coffee...


quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447

Also thought I'd point out that I'm really looking to contribute to the forum, have literally had no time in the last few weeks but will be back and offering my opinions and help as much as possible (to sum up, help will be repaid!!)

No worries hydra. Nice to have you here.


quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447

I may be a little behind, but has scraping JS been conquered yet? Only ask because we had a blue-chip motor (ahem bmw/merc/audi/skoda) site to scrape and it was

AJAX

  powered.

Do you mean building a scraper via JS? The jitko code did that quite effectively I believe... a bit scary, that...


quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447

Wikia Search - are you guys as worried as I am??!?!?!?!?!?!?

Why so? What's your fear?

Peacebackatcha,
/p

Bompa

quote author=hydra link=topic=381.msg2494#msg2494 date=1183665447


------------------

With these new URLs I'm proposing I'm crashing out at a fairly low level

Old page = product.html?id=123
New page = product/123

How do I:-

301 redirect 'old pages' to 'new page'


That looks straightforward.


quote
AND
Rewrite all 'new pages' to 'old page'


Huh?



quote
Basically 'redirecting to a rewrite' and my

apache

  is being caught in a loop.


no shit, new page -> old page -> new page



quote
We have existing pages that need a 301 redirect.


This should not be a major problem, ppl do this everyday.


quote
But want all new requests to be rewritten into a URL friendly format.


Same.  There's nothing unusual about wanting that feature.


The only reason I am not posting htaccess code is that I have a haunting feeling that
I am completely misunderstanding what you want to do.


later,
Bompa

hydra

B & P (Bompa & Perkiset)

Basically had 1500 old urls listed in the search engines. Need these all 301ing to a nice tidy URL (not a problem)
But these pages are hardcoded on the server, so I need the new tidy url to actually rewrite (apologies if this is the incorrect terminology(rewrite)) to the old pages (ie read them as the content)

I've found a solution, and here's the code if anyone needs it for the future


RewriteRule ^tidy/([^/]+)/?$ messy.

php

 ?param=$1 <>
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /messy.

php

 ?param=([^&]+) HTTP/
RewriteRule ^messy.

php

 $ http://localhost/tidy/%1/? [R=301,L]

RewriteRule ^tidypage/?$ page.htm <>
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /page.htm HTTP/
RewriteRule ^page.htm$ http://localhost/tidypage/? [R=301,L]


This does not get caught in a loop, and will redirect untidy versions to the tidy version (301) but the tidy versions will read the untidy versions content.

Wikia Search - is going to be a fairly radical change to search engines (open source / contribution based) which means MFAs, spam sites etc will not be accepted Applause Although nowhere yet, got a bad feeling that if it takes off could be a rival to Google in long run

Also, regarding automated screenshots of web pages, can anyone point me in the right direction?

Cheers all

--added--

Thanks perk for the offered solution of virtual hosts. Is a bit over my head at the moment (constantly testing/trying out new stuff - so will have a play over next few weeks) , I've taken a real 'caveman' approach with hardly any grace or finesse in the coding - I've got everything going through a

PHP

  handler as you say, and the handler rewriting tidy urls into the pages using the old strreplace looking up the old URLs in a mysql DB (which is synced to an MS Access DB Applause on the server )

WTF!!??!?!? am I doing, should just be pumping out MFAs and putting my feet up  Applause

hydra

quote author=perkiset link=topic=381.msg2500#msg2500 date=1183735434
Do you mean building a scraper via JS? The jitko code did that quite effectively I believe... a bit scary, that...


I meant server side apps (

PHP

 Applause scraping

javascript

  created content (not physically in the source code of the page until created by JS)?

perkiset

quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511

This does not get caught in a loop, and will redirect untidy versions to the tidy version (301) but the tidy versions will read the untidy versions content.

A fine solution. Pretty efficient as well.

quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511

Wikia Search - is going to be a fairly radical change to search engines (open source / contribution based) which means MFAs, spam sites etc will not be accepted Applause Although nowhere yet, got a bad feeling that if it takes off could be a rival to Google in long run



quote author=hydra link=topic=381.msg2503#msg2503 date=1183815511

Also, regarding automated screenshots of web pages, can anyone point me in the right direction?

Nutballs and I spoke about this a long time ago... I'll see if I can dig up some of our discussions. It was not pretty, and most probably a Windoz based solution. I'll see what I can find.

/p


Perkiset's Place Home   Politics @ Perkiset's