The Cache: Technology Expert's Forum

Level 2 Cache: Speciality Items => Regex => Topic started by: tommytx on April 09, 2009, 02:11:19 PM



Title: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 02:11:19 PM
I know this is really a simple command as i came across it just the other day.. I searched for regular expressions one line but nothing showed up.
It had something where it just used .?! any one of these followed by a spaced and simply inserted a line feed.

Can someone help me on this?



Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: perkiset on April 09, 2009, 02:25:16 PM
um, what language are you in? what OS? Doing this from the shell?

in PHP, if you wanted to change chr(13) or chr(10) into just a space to concatenate a bunch of CRLF / LF sentences into one big line you could do it much more quickly with a str replace:

$newBlock = str_replace(array(chr(10) . chr(13), chr(13), chr(10), '  ', ' ', $inputBuff);

This would change chr(10) + chr(13), chr(13), chr(10) or 2 sequential spaces into a single space.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 03:56:44 PM
Thanks Perkiset, that looks good...
By the way I am doing php. Windows XP Pro.
Actually on Xampp but that should not matter..

But here is the problem with the code you sent... it looks for existing 10 and 13...
What I need is the old time honored convert every sentence no matter how long or short to a single line entry.
For example:

Spot is my dog. See spot run. Spot  can run faster than any other dog on the planet.  Now
if you want to see a fast dog let me show you Rambo. Rambo can run very fast.  You would not believe
his speed!  Wow! and that is an understatement. Do you have any fast dogs?

You see I start out with only 2 lines and one return or 2 line feed.
*************************************************
What I need is :
Spot is my dog.
See spot run.
Spot  can run faster than any other dog on the planet.
Now if you want to see a fast dog let me show you Rambo.
Rambo can run very fast.
You would not believe his speed!
Wow!
and that is an understatement.
Do you have any fast dogs?

Now I really didn't want to split the Wow! but its ok... as the exclamation will almost never occur as will the question mark.. very very infrequently.. so it really won't matter.  But I think I need to look for a space following the punctuation also to keep it from finding such as mysite.com or $345.98 as a period. Don't want to use a punctuation followed by a double space as its scanning html and many times the double space for a new sentence is not there.


And the only way I can see to do this is detect all periods, question marks and exclamation marks to start a new line.

Something like:
************
Meaning find character followed by a space

$find = ". " or "? " or "! "

$replace = "return or new line"


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 05:19:18 PM
Perk.. are you aware the Rss for SEOIDIOTS is broken link?
feed://www.perkiset.org/forum/index.php?type=rss;action=.xml
just in case you are not aware..


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: vsloathe on April 09, 2009, 07:27:13 PM
Banged it out in this window, see if it works for you:

Code:
<?php
$content 
'READ YOUR TEXT IN OR C+P HERE';
$buff = array();
$buff preg_split('/[.|!|?]/'$content);
foreach(
$buff as $sentence){
echo 
$sentence.".\n";
}
?>



Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: Bompa on April 09, 2009, 08:14:42 PM
Banged it out in this window, see if it works for you:

Code:
<?php
$content 
'READ YOUR TEXT IN OR C+P HERE';
$buff = array();
$buff preg_split('/[.|!|?]/'$content);
foreach(
$buff as $sentence){
echo 
$sentence.".\n";
}
?>



This thread should be "How to Parse Sentances from a large paragraph".  :D

I've done it like VS's code, (but i dont use | inside square brackets), long ago when
injecting stuff into html paragraphs.




Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 08:17:58 PM
Thank you very much.. just exactly what I was looking for.. I came across it the other day but could not find it again. I will put a copy of this snippet in safe keeping for use again at a later time... PERFECT...

Sorry I don't know how to mark it code to make it in a box.
Here is my final i just added \r along with the \n to make it line feed.
Also included final project so if any noobie was following along how to make the final usable and savable.
I added the "
" to make it view good on the screen as the line feeds don't work on the echo..

Code:
<?php
$file 
fopen("content_one_line.txt""w");
$content file_get_contents("mycontent.txt");
$buff = array();
$buff preg_split('/[.|!|?]/'$content);
foreach(
$buff as $sentence){
  
$sentence trim($sentence);
  echo 
$sentence.".<br>\r\n";
  echo 
fwrite($file$sentence ".\r\n");
}
fclose($file);
?>



Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 08:23:36 PM
Just in case anyone is asking what the hell is this with the period included..->   .\r\n
Because the sucker ate the punctuation, I just added the period back at the end of the line..
Actually the proper way to do it would be to capture the punctuation mark like (?) or (.) and add that particular mark at the end of the screen...
I know it can be done, but you have helped so much i didn't have the heart to ask how.. and anyway a plain period will be the mark in 98 per cent of the cases anyway...

Thanks again..


Title: How to Parse Sentences from a large paragraph.
Post by: tommytx on April 09, 2009, 08:26:48 PM
Bompa great idea for a subject, I just didnt think of it..
Its nice when everyone uses the very best title so when searching it comes up more frequently..
I used the subject above... don't know if that becomes a search item or not..


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: nop_90 on April 09, 2009, 08:38:12 PM
" ".join("a\nb\nc\n".split("\n"))


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: Bompa on April 09, 2009, 08:40:59 PM
Cool.

You could put parenthesises around the square brackets.  Then whatever is "captured"
by the regex will be in $1.

preg_split('/([.!?])/', $content);
$puncuation = $1;

Code:
foreach($buff as $sentence){
  $sentence = trim($sentence);
  echo $sentence . "$puncuation <br>\r\n";
}


Bompa


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 09, 2009, 09:26:15 PM
EDIT: BAH! Sorry tommy. I went to hit "quote" to quote your post but I hit modify instead!  :doh: Sorry!

-V


Well, that's because Bompa doesn't write PHP :)
Code:
<?php
$content 
"See my dog run. Can your dog run fast? Hell yes he can!";
$buff = array();
$buff preg_split('/([.!?])/'$content, -1,  PREG_SPLIT_DELIM_CAPTURE);
foreach(
$buff as $sentence){
  
$sentence trim($sentence);
  echo 
$sentence "<br>\r\n";
  echo 
fwrite($file$sentence "\r\n");
}

Now it should just retain the original punctuation mark. Or it might put it as a second dimension on the array, I'm not sure. Test it and let me know.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: perkiset on April 09, 2009, 10:10:16 PM
 :doh:

Quote
How to break a ton of sentences into one single line with line feed.
break, INTO one single line with line feed... I read remove 13/10s and put into one big buff. Sorry meng.



Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: vsloathe on April 10, 2009, 06:37:43 AM
here:

Code:
<?php
$content 
"See my dog run. Can your dog run fast? Hell yes he can!";
$buff = array();
$buff preg_split('/([.!?])/'$content, -1,  PREG_SPLIT_DELIM_CAPTURE PREG_SPLIT_NO_EMPTY);
foreach(
$buff as $key => $sentence){
        if(
$sentence != '.' && $sentence != '?' && $sentence != '!'){
                
$sentence trim($sentence);
                echo 
$sentence $buff[$key 1] . "\n";
        }
}


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: vsloathe on April 10, 2009, 06:40:24 AM
By the way, this has been quite helpful. I'd never thought of capturing the actual punctuation mark.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: tommytx on April 10, 2009, 08:35:58 AM
Thanks for all the help from everyone.. and yes, Vsloathe, the last one did retain the punctuation mark.
Makes it more better....
Wonder if the more complex formula will cause a noticable slowdown when using large files..
I may try it on a large text file and measure the processs time and if a noticable difference stick adding a period after each one as 99% will be period anyway... So what is a few missing question marks among us scrapers. he he..
Thanks again for all the help.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: nutballs on April 10, 2009, 08:14:13 PM
Wonder if the more complex formula will cause a noticable slowdown when using large files..

yes it will.

speed is a factor of file size and regex complexity.
realize though that regex complexity might not always be what you think.

but in the same vein, its also probably the most efficient choice you have usually.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: vsloathe on April 11, 2009, 07:54:11 AM
Any zero-width tuple representations in regex are pretty hefty. Also negative and positive lookarounds and lookbehinds make it CRAWL if you're doing enough of them.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: perkiset on April 12, 2009, 10:12:33 AM
At that point, it would be faster to convert all sentence breaking instances (". ", "! " "?") into a single special character and then explode on it, thus:

$newBuff = str_replace(array('. ', '? ', '! '), '###', $inputBuff);
$array = explode('###', $newBuff);



Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: vsloathe on April 12, 2009, 10:52:08 AM
But the regex I gave him should be fast enough. It doesn't use any advanced regex syntax at all, it's extremely basic.


Title: Re: How to break a ton of sentences into one single line with line feed.
Post by: Bompa on April 12, 2009, 05:27:14 PM
You geeks love to discuss processing speed.   ::)

I am pretty sure he was talking only of the additional code to capture the punctuation.

Quote
Wonder if the more complex formula will cause a noticable slowdown when using large files

It's not going to be noticable unless the file is FRIGGIN HUGE.

imo