The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. February 11, 2012, 02:27:17 PM

Login with username, password and session length


Pages: 1 [2]
  Print  
Author Topic: Regex breakdown  (Read 1281 times)
Phaėton
Lifer
*****
Offline Offline

Posts: 503


⎝⏠⏝⏠⎠


View Profile
« Reply #15 on: April 12, 2010, 11:40:40 PM »

ya but that version includes the trailing \s

I was wondering if that was the best way to
get that trailing \s trimmed out.. i stumbled upon
the result with http:// included and the \s trimmed
out ... sort of thinking out loud and using it, i was thinking nested and sort of got
there by accident, not really getting it just yet... thats so
flexible it bends my brain a little trying to solve the puzzle
because the tools are there to tear a pattern out of infinite
possibilities..

$re = '/http:\/\/(.*?)\s/';

Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649 ' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'www.privacyrights.org/fs/fs6a-facta.htm' (length=39)
      1 => string 'uspirg.org/uspirg.asp?id2=13649' (length=31)
      2 => string 'en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=63)

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
Phaėton
Lifer
*****
Offline Offline

Posts: 503


⎝⏠⏝⏠⎠


View Profile
« Reply #16 on: April 12, 2010, 11:43:04 PM »

but with the outside parens i accidentally get array[1] which has that result i wanted
all cleaned up...

Code:
array
  0 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm ' (length=47)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649 ' (length=39)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act
' (length=71)
  1 =>
    array
      0 => string 'http://www.privacyrights.org/fs/fs6a-facta.htm' (length=46)
      1 => string 'http://uspirg.org/uspirg.asp?id2=13649' (length=38)
      2 => string 'http://en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=70)
  2 =>
    array
      0 => string 'www.privacyrights.org/fs/fs6a-facta.htm' (length=39)
      1 => string 'uspirg.org/uspirg.asp?id2=13649' (length=31)
      2 => string 'en.wikipedia.org/wiki/Fair_and_Accurate_Credit_Transactions_Act' (length=63)

Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 9792



View Profile
« Reply #17 on: April 13, 2010, 12:01:38 AM »

Um, what's the difference between array[0] and array[1]? I thought you were trying to scrape for what's seen in arr[2]...?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 509


View Profile
« Reply #18 on: April 13, 2010, 05:37:47 AM »

Um, what's the difference between array[0] and array[1]? I thought you were trying to scrape for what's seen in arr[2]...?

array0 includes a trailing slash, array1 does not.
Logged

"Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted." -- Albert Einstein
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #19 on: April 28, 2010, 12:22:00 PM »

Code:
<?php
$re 
'/http:\/\/(.*?)\s/';

kinda clunky and very greedy. I'd try to tighten it up a bit.

Code:
<?php
$re 
'/http:\/\/(\S+)\s/';

Keeping in mind this only works if you have whitespace after each URL. It'd fail parsing the links out of a page of HTML because you're not checking for quotes or anything. I use something like this:

Code:
<?php
$regex 
'/href=[\'\"](\S+)[\'\"]/';

to strip all the links out of a page. \S is shorthand for "anything but whitespace". I also frequently just make a negated set with quotes in it like [^\'\"] since a URI can't contain quote characters.
Logged

hai
Pages: 1 [2]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!