Thread: heeeeeeeeeeeeelp
m0nkeymafia

any of u

regex

  nutters got a few mins to help a monkey out?

ive got a script to grab all the classes and id's used on a webpage
i then want to pull off any matching css code for each class out of the css files, but i cant figure out a

regex

  to do it

i.e. if i have found a class called "topNav"
i want to search through the css files [already in a buffer] and find the corresponding definitions

i.e.

.topNav {  padding-top: 5px; }

would return just "padding-top: 5px;" obviously with any other lines

this way if i rob someones web layout / design and modify it, i can run it through this and discard any usesless classes
can anyone gimmie a hand on what to do? im stuck!!

cheers

perkiset

First - are you doing this in

PHP

 ?

(for the rest of this I'll assume so)

Second - you'll want to load all referenced CSS files (and their referenced files) just in case the definitions are not in the immediate page. Best way to find it then would be to create one big mess of all the linked files and do the

Regex

  on <that string>.

Then, assuming you got a class reference, like 'topNav' and have it in the variable $className, you could do something like this:

preg_match('/([#.]$className<>*{[^}]+/', $inputBuff, $matches);

... and then the $matches array would have anything that it found. Important to look for things that assign class by id too, rather than just class name - by this I mean:

<style>
#topNav { font-family: Arial; }
</style>
<div id="topNav">Here is the html</div>

Good luck!
/p

m0nkeymafia

cheers perk thats very similar to what i had but wasnt working, the bit at the end is different though.
will let ya know how it goes

and yeah its

php

  Applause
cheers matey

perkiset

Hey MM - I forgot something really important:

You really need to add an "m" as a modifier to the

regex

  ie.,

preg_match('/([#.]$className<>*{[^}]+/<>m', $inputBuff, $matches);

... this dicked with me for a long time.

PHP

  won't let the pattern span multiple lines without it... and if someone defines a class like this:

.topNav {
font-family: courier;
font-size: 12px;
}

.. then the first

regex

  I posted won't work.

Hope that helps,
/p

m0nkeymafia

excellent perk thanks, just about to try it Applause

m0nkeymafia

sorry to be lame but you omitted the closing )
where should it go?

Bompa

quote author=m0nkeymafia link=topic=371.msg2415#msg2415 date=1182934987

sorry to be lame but you omitted the closing )
where should it go?


My guess is at the end, but before the semicolon.


Bompa

m0nkeymafia

ahh amazin that worked - i think lol
cheers for the help guys
hopefully have this finished and tweaked by tonight o/

perkiset

Right on... BTW ... what's with the missing paren? There's no open parens in my post  Applause

m0nkeymafia

off the end apparently Applause

ok so i now have the

regex

  to pull classes and id's out of a webpage
'/(class|id) *= *"([a-zA-z0-9 ]*)"/i'

i then explode the results to pull out all classes even if two+ are specified

then i use perks code to grab the classes out of the style sheet
'/([#.]'.$val.'.*<>*{[^}]+)/m'

which all works great

only thing left to do is grab non-class selectors out of the style sheet, i.e. body {} and so forth
I originally tried something like this:

'/([^#.]'.$val.'.*<>*{[^}]+)/m'

but didnt particularily work very well, as it seemed to pull all the classes off :/
anyone have any further ideas?

cheers

perkiset

On quick note, I'm assuming that the same code you used for style definitions could be used, except simpole negate the period or pound sign in front and replace the $val with letters/numbers...

/([<>^#.]<>[A-Za-z0-9_]+<>*{[^}]+)/m

... I think that'd do it, although you might possible get some stuff on a page that you didn't want... that one might take a bit more testing...

m0nkeymafia

Yeah perk i tried that [although i now realise i copied teh wrong syntax in my previous post]

for some reason it pulls up class definitions even though they have a preceeding period! very odd?
p.s. i actually get the sytnax u provided previously, when u first posted it i was like WTF !! but now it make sense, i really hope my

regex

  skills keep improving lol [slowly Applause]

so any ideas why it may not work? even though we say we want text that is NOT directly preceeded by a # or . ??

Bompa

The dot is a literal within square brackets.  Applause

(Dont escape it).




Bompa

thedarkness

Nice pickup Bomps.

Cheers,
td

m0nkeymafia

nice one bompa.
it still doesnt work though?

/([^.#][a-zA-Z0-9_].*<>*{[^}])/m

so if we expand it into constituent parts we have

Match a string that follows these rules:
- Starts NOT with a . or #
- Immediately followed by any number of letters or numbers or underscores
- Then match any amount of whitespace
- Then it needs to find an open parenthesis
- It then matches until it hits a closing parenthesis, at which point it stops
-
* Works over multiple lines

I cannot see how this matches ALL classes within my stylesheet?
The critical part, the NOT . or # reads fine to me? What am i missing?

Cheers

Bompa

I would like to do some experimenting, but since I am a css idiot, maybe you
could give me a sample of a page or a url to any page where this should work?


later,
Bompa

Bompa

quote author=m0nkeymafia link=topic=371.msg2464#msg2464 date=1183208136

nice one bompa.
it still doesnt work though?

/([^.#][a-zA-Z0-9_].*<>*{[^}])/m

so if we expand it into constituent parts we have

Match a string that follows these rules:
- Starts NOT with a . or #
- Immediately followed by any number of letters or numbers or underscores
- Then match any amount of whitespace
- Then it needs to find an open parenthesis
- It then matches until it hits a closing parenthesis, at which point it stops
-
* Works over multiple lines

I cannot see how this matches ALL classes within my stylesheet?
The critical part, the NOT . or # reads fine to me? What am i missing?

Cheers


I think you're verbalizing it wrong, but anyways, 

perk's code  /([^#.][A-Za-z0-9_]+<>*{[^}]+)/m
your code  /([^.#][a-zA-Z0-9_].*<>*{[^}])/m

perk has the + sign, you don't, instead you have .*
he also has a + near the end.


still if i had a sample of text to parse i could experiment

Bompa

Bompa

Ok, you all must be sleeping like babies.

I dug up a sample of css.


This

perl

  works for me:

while( $text =~ /(^[^#.]w+{.*?})/msg ) { print "$1 "; }

This says
The character immediately after a newline can not be a # nor .
Then followed by one or more alphanumeric characters
Then followed by a left curly brace
Then followed by anything, (including a new line)
Stop matching at the first right curly brace


While the above worked for me, it's likely that I do not have a complete sample of
the text you'll be parsing.  Let me know.


There seems to be a lot of confusion with the ^ and $ metacharacters, as well as with the /m and /s flags.


m0nkeymafia

Hey Bompa
Cheers for goin out your way matey!
I think i was sleeping / making up time with my gf lol

I managed to get a working version, some limitations, but works fairly well.
Not had much chance to test it though, i got it working on my sample set then fished it off lol
Too much

regex

  makes m0nkey a dull boy Applause

/^s*([w,:]*)s*{([^}]*)}/m

When I get chance I'll have a play with ur code see if it works better

Cheers tho dude, you posted like 3 times in a row for me Applause

thedarkness

Yeah Bomps,

You're like a true humanitarian......  Applause

Cheers,
td

m0nkeymafia

hahahaha
well were not curing world hunger, but a thankya was needed to y'all

perkiset

Hey all - sorry about not responding, turns out that even though I was mobile I was just to frigging happy being offline to make much use of my new broadband card to hook up.

Bomps - here is a bit of production CSS to look at (take from a bunch of different places in my code, this wouldn't necessary all be on one page, but gives a nice idea of what it all looks like:


<style>
<style type="text/css">
.arial { font-family: "Trebuchet MS", Verdana, Arial, Helvetica, sans-serif; line-height: 120%; }
.normal { font-weight: normal; }
.bold { font-weight: bold; }
.wt { color: #FFFFFF; }
.pad { padding: 1px 5px 1px 5px; }
A.wt { text-decoration: none; }
A.wt:HOVER { background-color: #671218; }
.bk { color: #220004; }
A.bk { text-decoration: none; }
A.bk:HOVER { background-color: #c78289; color: #ffffff; }
.s9 { font-size: 9px; }
.s10 { font-size: 10px; }
.s11 { font-size: 11px; }
.s12 { font-size: 12px; }
.s14 { font-size: 14px; }
.s16 { font-size: 16px; }
.s18 { font-size: 18px; }
.s20 { font-size: 20px; }
.s24 { font-size:24px; }
.left { text-align: left; }
.center { text-align: center; }
.right { text-align: right; }

body {
margin: 0px 0px 0px 0px;
background-image: url('/graphics/bgtile.gif');
background-repeat: repeat;
}

#main {
padding: 10px 10px 10px 10px;
}
.placard {
background-color: #ffffff;
border-style: solid;
border-color: #000000;
border-width: 1px 3px 3px 1px;
padding: 5px 10px 20px 10px;
}

#clientArea {
padding: 0px 30px 40px 40px;
}

</style>

</style>


I am pretty consistent about my spacing, but there is no really strong standard out there for spacing / line breaking - I just write it so that it looks right.

These examples are all valid:


A.bk:HOVER { background-color: #c78289; color: #ffffff; }

A.bk:HOVER {
background-color: #c78289; color: #ffffff; }

A.bk:HOVER
{
background-color: #c78289;
color: #ffffff;
}
A.bk:HOVER{background-color:#c78289;color:#ffffff;}


I think your example of
/(^[^#.]w+{.*?})/msg

is really close: I'd add that there *may* be white space between the class name and the first squiggly brace, but that's about it I think.

/p

m0nkeymafia

yeah perk i think ur right dude

mobile? where are ya?

perkiset

Back now - I was at Lake Mead for the last 4 days - just bought a Sprint broadband (EVDO) card - it rocked. I was on the

net

  and working for both the 4 hour drive up and back... but while I was there I just relaxed. Ahhh. Applause

But now back to reality...  Applause ... it's accounting day so I am still officially MIA although available here and there.

/p

m0nkeymafia

accounting day? ouch
in britain the company u works for takes care of that Applause

perkiset

Applause

I haven't worked for someone else since my teens. Employment sucks, but so does accounting  Applause

To each his own I guess...

/p


Perkiset's Place Home   Politics @ Perkiset's