The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 22, 2019, 05:53:33 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: pcre regex for http urls?  (Read 5349 times)
svakanda
Expert
****
Offline Offline

Posts: 131



View Profile
« on: June 06, 2008, 03:20:24 PM »

Does anyone have a perl/php compatible regex for picking up http urls?  I'd love to get my hands on one...been able to code about half of it.  But i keep getting lost. 
Logged

a ship is safe in the harbor, but that's not what it's for.
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #1 on: June 06, 2008, 04:07:09 PM »

regex sucks shit for that sort of stuff
use the mechanize module with perl
http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #2 on: June 06, 2008, 04:37:53 PM »

Yeah, sucks so much shit...LOL

For only links to external sites:
Code:
'/a href=\"?(http:\/\/.+)\"?/'

For links both internally and externally:
Code:
'/a href=\"?([^\"\>\<]+)\"?/'

Tried to keep them as tight as I could for you.
Logged

hai
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #3 on: June 07, 2008, 12:04:03 AM »

More fodder, had these lying around in a regex test script where I leave old regexs in case I need 'em. Look over them carefully coz they may not be (probably aren't?) suitable. They may serve as a teaching resource, they may not  Grin

Code:
//$regex = "@<\s*a\s+[^>]*href\s*=\s*[\"'](http|https|ftp)://(.*?)[\"'/]@is";
//
//$regex = "@(http:\/\/w{3}\.[^.]+?\.[a-z]{3})@i";
//
//$regex = "@(https?:\/\/[A-Z0-9.-]+)@i";

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: June 07, 2008, 06:00:34 AM »

Does anyone have a perl/php compatible regex for picking up http urls?  I'd love to get my hands on one...been able to code about half of it.  But i keep getting lost. 

hey sva,  any urls or just hyperlinks?

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
svakanda
Expert
****
Offline Offline

Posts: 131



View Profile
« Reply #5 on: June 16, 2008, 02:42:24 PM »

Hey!!! thankyou guys for all the great input!

I can't really use mechanize I don't think as I'm using php. 

@vsloathe, those look great, I think that will work precisely for what I need.  I spent so much time trying to define the url starting with http.....that I never even considered using href as a base...and that means you can make MUCH simpler regexs.   Vundabar!

@Bompa, I'm not certain.  Probably just hyperlinks.  I don't think I need url/uri  like ftp or nonesuch.  It's just for a web spider.

thankyou everyone!

/svakanda out
Logged

a ship is safe in the harbor, but that's not what it's for.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #6 on: June 17, 2008, 05:33:52 AM »

No worries. Your mileage may vary on the
Code:
\"?
part.

The original HTML spec says you don't absolutely have to have quotes for anchor tags, but I'd say maybe .1% of the sites on the web don't.
Logged

hai
svakanda
Expert
****
Offline Offline

Posts: 131



View Profile
« Reply #7 on: June 17, 2008, 07:05:15 AM »

no worries, thanks for the input!
Logged

a ship is safe in the harbor, but that's not what it's for.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #8 on: June 17, 2008, 07:14:06 PM »

I'd prolly be inclined to use [\'\"] as in my first example because you can use single quotes too.

Cheers,
td

edit: Maybe [\'\"\s] ?
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!