The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 23, 2019, 11:38:46 AM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: Regex breakdown  (Read 5625 times)
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« on: April 11, 2010, 09:54:13 PM »

Okay Im trying to understand this...

 [^0-9].+?

In english,
is it:  Not zero through 9 any character except newline, zero or one times,  one or more times?

is this translation  correct?

then why is it getting two characters after the slashes.... what am i missing here...
im trying to just grab everything every http://  up until a #32


Code:
<?php

 $re 
'/(http:\/\/[^0-9].+?)/';
//                     

preg_match_all$re,
    
" Vermont was the first state to pass more aggressive privacy policies
 with respect to credit reports in 1992. Maine and California are typically
 early adopters when caring for their state citizen’s privacy. 
http://www.privacyrights.org/fs/fs6a-facta.htm http://uspirg.org/uspirg.asp?id2=13649 
http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act

"
,
    
$outPREG_PATTERN_ORDER);

var_dump($out);

?>



Code:
array
  0 =>
    array
      0 => string 'http://ww' (length=9)
      1 => string 'http://us' (length=9)
      2 => string 'http://en' (length=9)
  1 =>
    array
      0 => string 'http://ww' (length=9)
      1 => string 'http://us' (length=9)
      2 => string 'http://en' (length=9)




Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #1 on: April 11, 2010, 11:13:31 PM »

Okay Im trying to understand this...

 [^0-9].+?

In english,
is it:  Not zero through 9 any character except newline, zero or one times,  one or more times?

is this translation  correct?

then why is it getting two characters after the slashes.... what am i missing here...
im trying to just grab everything every http://  up until a #32


Close.

$re = '/(http:\/\/[^0-9].+?)/';

That will match http:// followed by a single character that is not 0-9, followed by
by one or more of anything. The ? is either doing nothing is this case or screwing
things up. 

As quantifiers, the + usually means "one or more". The ? usually means 0 or 1 time,
but can also be used to stop the greediness of matching, like in my example below.


The block of text that you're parsing is a tough one cuz it doesn't
have clear cut endings for the match.

What I would try is to match everything from http:// to the first white space.

This perl worked in a quick test:

$string =<<here;
Vermont was the first state to pass more aggressive privacy policies
 with respect to credit reports in 1992. Maine and California are typically
 early adopters when caring for their state citizen’s privacy.
http://www.privacyrights.org/fs/fs6a-facta.htm http://uspirg.org/uspirg.asp?id2=13649
http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
here

Code:
while($string =~ /(http:\/\/.*?\s)/msg) {
  $url = $1;
  print "$url\n";
}

The * means unlimited number of times, but the ? immediately following means
"up to the first", then I have \s for any white space.

Ooops, actually, mine gets all the urls, but I didn't follow why up to a #32
cuz I did not see a #32.

If you're not confused yet, let me know, I can type more nonsense.  Cheesy

Bompa
« Last Edit: April 11, 2010, 11:15:19 PM by Bompa » Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #2 on: April 12, 2010, 01:23:01 AM »

I think I get it, I just dont get preg_match_all the way

Why does it return two arrays?


print_r($out);
Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649
' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649
' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)


Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #3 on: April 12, 2010, 03:32:16 AM »

For that, you'll have to wait for a php guy to come by.

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: April 12, 2010, 07:18:38 AM »

You could have more than one set of parens (captures).

preg_match returns the first array
  • as what matched the entire pattern, then array[1] will be the things that matched the first set of parens, array[2] will be the second set of parens etc.

Really nice post, Bomps.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #5 on: April 12, 2010, 08:54:03 AM »

Only one set of parens... i can get the answer i need just
grab the elements from array 0 but heres the code:

but just so i can understand, what am i putting in here that makes it two arrays?

Code:
<?php
$re 
'/(http:\/\/.*?\s)/';
preg_match_all$re,
    
" Vermont was the first state to pass more aggressive privacy policies 
with respect to credit reports in 1992. Maine and California are typically
 early adopters when caring for their state citizen’s privacy.
       http://www.privacyrights.org/fs/fs6a-facta.htm 
    http://uspirg.org/uspirg.asp?id2=13649
     http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act

"
,$outPREG_PATTERN_ORDER);
var_dump($out);
//print_r($out);
foreach ($out[0] as $k=>$v){
   
echo "k::$k::v::$v::<br><hr><br>";
}
?>


Only one set of parens..

the output:

Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649
' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649
' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)

k::0::v::http://www.privacyrights.org/fs/fs6a-facta.htm ::

--------------------------------------------------------------------------------

k::1::v::http://uspirg.org/uspirg.asp?id2=13649 ::

--------------------------------------------------------------------------------

k::2::v::http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act ::

--------------------------------------------------------------------------------

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: April 12, 2010, 10:42:00 AM »

Yes ... if you had no parens that REGEX would work the same. You do not need parens around the whole thing to capture it.

If you ditch the parens, you'll still get an array of arrays ... but array[0][0..n] will contain your answers. Consider this:

preg_match('/([A-Z0-9]{1,5})-([A-Z0-9]{1,5})/i', $input, $parts);

The array would come back with 3 entries, then sub elements that matched the items within the parenthesis.

The deal is that you're thinking from a perspective of your targeted search, not all searches: it is entirely possible that the regex above could return MANY entries. I'm going to try to type out a quick example for demonstration:

Input value:
T12142-12345, T1334-111, T55555-12345

Return array:

$arr(0)
   (0) - T12142-12345
   (1) - T1334-111
   (2) - T55555-12345
$arr(1)
   (0) - T12142
   (1) - T1334
   (2) - T55555
$arr(2)
   (0) - 12345
   (1) - 111
   (2) - 12345


Note how the first array entry is every time the entire pattern was found, the successive arrays contain the captures I defined with parenthesis.

Hope that helps Smiley
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #7 on: April 12, 2010, 01:53:47 PM »

Yes this is all supremely logical. Just be logical.
Logged

hai
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #8 on: April 12, 2010, 06:57:06 PM »

 http://php.net/manual/en/function.preg-match-all.php
PREG_PATTERN_ORDER

    Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

I think perk said that in his overly complex way. Wink

$matches[0] < That's your first array output, no parentheses needed.
$matches[1] < Second array is caused by parentheses.

Ditch the parens, I think that is your answer.

Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: April 12, 2010, 06:58:14 PM »

 ROFLMAO
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #10 on: April 12, 2010, 10:33:38 PM »

Thanks for your responses! I appreciate your help!

I think I get it.. I did read around before I posted....

However, I'm much more of a haptic learner so now having DONE
this I can see how cool this can be... so I could use the
parentheses to strip out the http:// and other stuff like this and go with array[2]

Nice.  Bug style parsing with collection on steroids.

$re = '/(http:\/\/(.*?)\s)/';

Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649 ' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649 ' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  2 =>
    array
      0 => string 'www.privacyrights.org/fs/fs6a-facta.htm' (length=39)
      1 => string 'uspirg.org/uspirg.asp?id2=13649' (length=31)
      2 => string 'en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=63)
« Last Edit: April 12, 2010, 10:39:34 PM by Phaėton » Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #11 on: April 12, 2010, 10:39:57 PM »

The outermost parens are redundant. You'll catch everything in arr[0] just because it matches your pattern.

The inner parens will then show up in arr[1].

Also, I'm not sure if the Regex parse is just forgiving you, I didn't think you could nest captures...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #12 on: April 12, 2010, 10:58:43 PM »

Ah. gotcha.  thats where I initially was confused with
the double arrays... so heres what i meant to post
earlier:

$re = '/(http:\/\/(.*?))\s/';

Thank you again!  That's a dyslexic nightmare... Cheesy

Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649 ' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm' (length=46)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649' (length=38)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=70)
  2 =>
    array
      0 => string 'www.privacyrights.org/fs/fs6a-facta.htm' (length=39)
      1 => string 'uspirg.org/uspirg.asp?id2=13649' (length=31)
      2 => string 'en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=63)

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #13 on: April 12, 2010, 11:11:08 PM »

Is there a tighter way to write this regex... the target result being
array[1] directly above this post... I guess not since its gonna
spit out array[0] no matter what, right?
Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #14 on: April 12, 2010, 11:16:24 PM »

No meng, you're cool. Just:

$re = '/http:\/\/(.*?)\s/';


... no outside parens.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!