|
m0nkeymafia
|
 |
« on: June 26, 2007, 01:17:48 PM » |
|
any of u regex nutters got a few mins to help a monkey out?
ive got a script to grab all the classes and id's used on a webpage i then want to pull off any matching css code for each class out of the css files, but i cant figure out a regex to do it
i.e. if i have found a class called "topNav" i want to search through the css files [already in a buffer] and find the corresponding definitions
i.e.
.topNav { padding-top: 5px; }
would return just "padding-top: 5px;" obviously with any other lines
this way if i rob someones web layout / design and modify it, i can run it through this and discard any usesless classes can anyone gimmie a hand on what to do? im stuck!!
cheers
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5142
:sniffle: Humor was so much easier before.
|
 |
« Reply #1 on: June 26, 2007, 02:44:50 PM » |
|
First - are you doing this in PHP?
(for the rest of this I'll assume so)
Second - you'll want to load all referenced CSS files (and their referenced files) just in case the definitions are not in the immediate page. Best way to find it then would be to create one big mess of all the linked files and do the Regex on <that string>.
Then, assuming you got a class reference, like 'topNav' and have it in the variable $className, you could do something like this:
preg_match('/([#\.]$className[\s]*{[^}]+/', $inputBuff, $matches);
... and then the $matches array would have anything that it found. Important to look for things that assign class by id too, rather than just class name - by this I mean:
<style> #topNav { font-family: Arial; } </style> <div id="topNav">Here is the html</div>
Good luck! /p
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
m0nkeymafia
|
 |
« Reply #2 on: June 27, 2007, 12:49:09 AM » |
|
cheers perk thats very similar to what i had but wasnt working, the bit at the end is different though. will let ya know how it goes and yeah its php  cheers matey
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5142
:sniffle: Humor was so much easier before.
|
 |
« Reply #3 on: June 27, 2007, 01:25:03 AM » |
|
Hey MM - I forgot something really important:
You really need to add an "m" as a modifier to the regex ie.,
preg_match('/([#\.]$className[\s]*{[^}]+/m', $inputBuff, $matches);
... this dicked with me for a long time. PHP won't let the pattern span multiple lines without it... and if someone defines a class like this:
.topNav { font-family: courier; font-size: 12px; }
.. then the first regex I posted won't work.
Hope that helps, /p
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
m0nkeymafia
|
 |
« Reply #4 on: June 27, 2007, 01:27:10 AM » |
|
excellent perk thanks, just about to try it 
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
|
m0nkeymafia
|
 |
« Reply #5 on: June 27, 2007, 02:03:07 AM » |
|
sorry to be lame but you omitted the closing ) where should it go?
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
|
Bompa
|
 |
« Reply #6 on: June 27, 2007, 02:14:00 AM » |
|
sorry to be lame but you omitted the closing ) where should it go?
My guess is at the end, but before the semicolon. Bompa
|
|
|
|
|
Logged
|
|
|
|
|
m0nkeymafia
|
 |
« Reply #7 on: June 27, 2007, 02:21:11 AM » |
|
ahh amazin that worked - i think lol cheers for the help guys hopefully have this finished and tweaked by tonight \o/
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5142
:sniffle: Humor was so much easier before.
|
 |
« Reply #8 on: June 27, 2007, 09:04:22 AM » |
|
Right on... BTW ... what's with the missing paren? There's no open parens in my post 
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
m0nkeymafia
|
 |
« Reply #9 on: June 28, 2007, 09:17:30 AM » |
|
off the end apparently  ok so i now have the regex to pull classes and id's out of a webpage '/(class|id) *= *\"([a-zA-z0-9 ]*)\"/i' i then explode the results to pull out all classes even if two+ are specified then i use perks code to grab the classes out of the style sheet '/([#\.]'.$val.'.*[\s]*{[^}]+)/m' which all works great only thing left to do is grab non-class selectors out of the style sheet, i.e. body {} and so forth I originally tried something like this: '/([^#\.]'.$val.'.*[\s]*{[^}]+)/m' but didnt particularily work very well, as it seemed to pull all the classes off :/ anyone have any further ideas? cheers
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
perkiset
Olde World Hacker
Administrator
Lifer
   
Online
Posts: 5142
:sniffle: Humor was so much easier before.
|
 |
« Reply #10 on: June 28, 2007, 10:34:43 AM » |
|
On quick note, I'm assuming that the same code you used for style definitions could be used, except simpole negate the period or pound sign in front and replace the $val with letters/numbers...
/([^#\.][A-Za-z0-9_]+[\s]*{[^}]+)/m
... I think that'd do it, although you might possible get some stuff on a page that you didn't want... that one might take a bit more testing...
|
|
|
|
|
Logged
|
If I can't be Mr. Root then I don't want to play.
|
|
|
|
m0nkeymafia
|
 |
« Reply #11 on: June 28, 2007, 03:21:50 PM » |
|
Yeah perk i tried that [although i now realise i copied teh wrong syntax in my previous post] for some reason it pulls up class definitions even though they have a preceeding period! very odd? p.s. i actually get the sytnax u provided previously, when u first posted it i was like WTF !! but now it make sense, i really hope my regex skills keep improving lol [slowly  ] so any ideas why it may not work? even though we say we want text that is NOT directly preceeded by a # or . ??
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
|
Bompa
|
 |
« Reply #12 on: June 28, 2007, 11:39:28 PM » |
|
The dot is a literal within square brackets.  (Dont escape it). Bompa
|
|
|
|
|
Logged
|
|
|
|
|
thedarkness
|
 |
« Reply #13 on: June 29, 2007, 05:39:25 AM » |
|
Nice pickup Bomps.
Cheers, td
|
|
|
|
|
Logged
|
"I want to be the guy my dog thinks I am." - Unknown
|
|
|
|
m0nkeymafia
|
 |
« Reply #14 on: June 30, 2007, 05:55:36 AM » |
|
nice one bompa. it still doesnt work though?
/([^.#][a-zA-Z0-9_].*[\s]*{[^}])/m
so if we expand it into constituent parts we have
Match a string that follows these rules: - Starts NOT with a . or # - Immediately followed by any number of letters or numbers or underscores - Then match any amount of whitespace - Then it needs to find an open parenthesis - It then matches until it hits a closing parenthesis, at which point it stops - * Works over multiple lines
I cannot see how this matches ALL classes within my stylesheet? The critical part, the NOT . or # reads fine to me? What am i missing?
Cheers
|
|
|
|
|
Logged
|
I am Tyler Durden
|
|
|
|