The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 23, 2019, 02:44:33 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: regex white space limit problem  (Read 10571 times)
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« on: January 29, 2011, 05:22:38 AM »

Hi,

I'm having problems with a regex, I need to grab some lines of text but I only want the regex to match it until the last white space... this is what I have:

$a = 'xxxx "Some Text Here":http://example.com/img/dsd yuyuyuyuyuy gjgjgjg ffffddddss';
$p = '%^(?i)(?:.*)"(.*)"Sad.*)\s%';


preg_match($p, $a, $matches);
print_r($matches);


echo '<a href="'.$matches[2].'">'.$matches[1].'[/url]';


This is the output:

some text here blabh blah

Anchor textArray
(
    
  • => xxxx "Anchor Text Here":http://example.com/img/dsd yuyuyuyuyuy
  • [1] => Anchor Text Here
        [2] =>
http://example.com/img/dsd yuyuyuyuyuy
)
Anchor Text Here



So I would like to avoid displaying the last space and the 'yuyuyuyuyuy' so the url would be a ok...




Any ideas?



Thank you,


LS




Logged

No links in signatures please
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #1 on: January 29, 2011, 07:14:12 AM »

Hi,

You want to match from the beginning of the line to the last space?


Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« Reply #2 on: January 29, 2011, 07:16:11 AM »

Hi Bompa,

Yeah I only need in this case what is BEFORE the white space, in this case is a URL ...but I'm still getting whatever is after the space...


Thank you,

LS
Logged

No links in signatures please
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« Reply #3 on: January 29, 2011, 07:31:29 AM »

To be more specific... the second part of the string that I need start after

":

$a = 'xxxx "Some Text Here":http://example.com/img/dsd yuyuyuyuyuy gjgjgjg ffffddddss';

As you can see when I build a normal html link it continue adding the space+ yuyuyuyuyuy

The optimal output will be:

Code:
echo '<a href="'.$matches[2].'">'.$matches[1].'</a>';

Quote

BUT I'm Getting




Quote




I need to get rid of what ever is coming after the white space... in this case it should't display the yuyuyuyuyuy




Got it?





Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: January 29, 2011, 01:24:12 PM »

xxxx "Some Text Here":http://example.com/img/dsd yuyuyuyuyuy gjgjgjg ffffddddss

Im not Bomps when it comes to regex, but Im curious about the technique to grab wht you want.

First, if I read you right, you definitely want the URL, yes? so lets build that piece:

Code:
(http://[/S]*)

this will grab whater is non-white space after the HTTP. if the previous text is really consistent, then grabbing it and using the http grab the anchor would work fine:

Code:
\"([^\"]*)\":\[b\](http://[/S]*)

cant check it right now, but would that do what you are looking for?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« Reply #5 on: January 29, 2011, 01:51:29 PM »

Hey Perkiset,

Thank you for your answer, in this case I want to grab the url but it could be anything, any string or numbers, my main concern on this is to only grab what is before the space... regarding of what it is...can't make it happen yet....

Thanks

LS
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: January 29, 2011, 03:00:36 PM »

I guess I'm still a little unclear on what it all might look like - but the simplest answer to say "I want to collect everything from (where I am) to something that is white space is just
Code:
([\S]*)

You could make it more readable against your criteria by saying "collect anything that is NOT a space..."

Code:
([^\s]*)

The way I look at text strings that I need to parse via Regex is to construct the parts to grab things I need, then string them together using anchors (my word here, things that tell me where I am in the string of characters) to locate myself and then collect. If you had a few more examples of some varied strings, particularly ones that are really giving you troubles that'd be very helpful.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #7 on: January 29, 2011, 08:03:07 PM »

I don't really understand the question.
But for starters don't reinvent the wheel.
parse the url with
http://php.net/manual/en/function.parse-url.php

every url can be broken into those parts. (almost every main steam language has something like parse_url)
then after that get the part you want, and do the reg exp on that.

regexp might not be best in this case
If I understand correctly you want some part of "dsd yuyuyuyuyuy gjgjgjg ffffddddss" (or not display)
you should be able to use an explode http://php.net/manual/en/function.explode.php
or one of the variant of it listed below

Logged
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« Reply #8 on: January 30, 2011, 03:51:24 AM »

Hi,

I'll try to put it clearer:

Let's say I have the following line of text:

Cats and dogs are always "doing it":running or catching cats

My intention is to grab whatever it is between brackets (doing it)... and the word that is coming after the colons(running)

then my output will be:

doing it running


NOTE: it is easy to grab whatever it is between the brackets but the next word that ends with a space it is not easy....that is my problem I continue getting

doing it running or

I don't want the word OR or any other word after...

I only need

doing it running

This is what I have:
Code:
$p = '%^(?i)(?:.*)"(.*)".*)\s%';


I hope is more clear now Wink

LS


Logged

No links in signatures please
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #9 on: January 31, 2011, 06:05:11 AM »

Hi London,

You see this part of your regex? 
Code:
.*)\s

That is match everything up to a whitespace. This is why they say that regex is
greedy. It will match everything up to the last whitespace that it can find.

Adding a ? after the quantifier removes the greediness...

I would try:
Code:
.*?)\s

Which is match everything up to the first whitespace.

But it's late here so i dunno,
Bompa
Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
LondonSEO
Rookie
**
Offline Offline

Posts: 15


View Profile
« Reply #10 on: January 31, 2011, 06:14:26 AM »

Brilliant Bompa! Spot on!

Thank you very much!  Wink

LS
Logged

No links in signatures please
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!