The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. February 12, 2012, 07:16:48 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Regular expression problem  (Read 760 times)
taltas
n00b
*
Offline Offline

Posts: 1


View Profile
« on: August 27, 2010, 11:28:52 AM »

Hi all,

I am trying to the following but I really struggling.
I am using python 2.6 I cannot import ngrams as well.

My problem is i have string
st = 'A quick brown fox jumped over 007 wink* wink*'

I am trying to get bigrams using regular expressions ie:

'qu' 'ui' 'ic' 'ck' 'br' 'ro' 'ow' 'wn' 'fo' 'ox' 'ju' 'um' 'mp' 'pe' 'ed' 'ov' 've' 'er' '00' '07' 'wi' 'in' 'nk' 'wi' 'in' 'nk'

basically i thought of:

(\w{2})+

but this gave me:
'ic', 'ow', 'fo', 'ed', 'er', '00', 'nk', 'nk'

which is obviously not what i need...
any suggestions?




Logged

No links in signatures please
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #1 on: August 27, 2010, 05:33:03 PM »

Hi taltas,

don't know the Python code but this is working in Perl:

Code:
$string = 'A quick brown fox jumped over 007 wink* wink*';

while ( $string =~ m{ (\w{2}) }xmsg ) {
    pos $string = ( pos $string ) - 1;
    print "'$1' ";
}

It's a bit tricky because you always have to go one step back.

"pos" returns the position of the last search. If you substract 1 you
can go one step back.

Logged
Bompa
Administrator
Lifer
*****
Online Online

Posts: 509


View Profile
« Reply #2 on: August 28, 2010, 02:46:38 AM »

That's a good one Dirk.


Code:
$st = 'A quick brown fox jumped over 007 wink* wink*';

@chars = split(//, $st);  # SPLIT STRING INTO A LIST

for($x=0; $x<@chars; ++$x) {
  if($chars[$x] =~ /\w/ and $chars[$x+1] =~ /\w/) {
    print "$chars[$x]$chars[$x+1]\n";
  }
}

Split the string into an array, iterate the array, if current character
is a valid character and is followed by a valid character, print both. 
In my demo code, I just used \w as a validation to keep it simple in my mind.

At TheCache, you get a choice of solutions. Cheesy
Bompa


« Last Edit: August 28, 2010, 02:48:10 AM by Bompa » Logged

"Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted." -- Albert Einstein
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #3 on: August 28, 2010, 06:57:57 AM »

That's Perl ;-)

"There's more than one way to do it", commonly known as TMTOWTDI.

Or the second slogan which is often forgotten:

"Easy things should be easy and hard things should be possible".
Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #4 on: September 07, 2010, 06:51:11 AM »

Python's slogan is "Perl done right", but I'm more of a Ruby man these days.
Logged

hai
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #5 on: September 07, 2010, 07:37:23 AM »

Python is interesting, I will use it too for our projects.

Anyway, a slogan shouldn't look at other languages.

Python is "Python done right" would be a better slogan ;-)
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!