The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 22, 2019, 07:39:01 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: PHP Text Parser Break long Text in sentense  (Read 2621 times)
streetparade
n00b
*
Offline Offline

Posts: 2


View Profile
« on: September 06, 2009, 12:37:07 PM »

Hello Everyone
Its good to be a member here ;-)
Im writing a script which should break a text in to sentences.
If a sentense ends with . (dot) and free space its ok, it breaks the text but if after the dot is no free space it doesnt.
I tryed preg_match_all it didnt work. Preg_split worket.
Pls look at the script and hlp.

<?php

   #doc
   #   classname:   Textsentenseparser
   #   Author:         Faruk
        #       
   #
   #/doc
   
class Textsentenseparser
{
      
      

      
      public function parsesim($sentences,$sentences2)
      {
      $sentences = trim($sentences);
      $sentences2 = trim($sentences2);
      
      $sentences = preg_split('#(.*?[^\.][a-z0-9][\.\!\?]) #is',$sentences,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
      $sentences2 = preg_split('#(.*?[^\.][a-z0-9][\.\!\?]) #is',$sentences2,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
      

       for ($a = 0; $a < count($sentences2); $a++)
       {
   
       
       for ($a = 0; $a < count($sentences); $a++)
       {
          
          similar_text($sentences[$a],$sentences2[$a],$p);
          if(strlen($sentences[$a]) >10 && strlen($sentences2[$a]) > 10)
          {
          
          //empty($sentences[$a]) ? strlen($sentences2[$a]) >10 : strlen($sentences[$a]);
          echo "
MATCHING $sentences[$a]
 WITH $sentences2[$a]\n";
          echo $p;
          echo  round($p).""."\n";
          }
          
             
       }
   
      echo $p;
       }
       
       }
         
}
      
$txt2 = <<< TEXT
A simple tool to store and display texts longer than a few lines.

The search button will highlight all the words matching the name of objects that are members of the classes listed in searchedClasses, itself a member of the KeySet class. The highlighted words are hypertext.

Edit invokes wscripts/acedb.editor, which by default launches emacs. Edit that file to start another editor in its place.

Save will recover from the emacs but will not destroy it.

Read will read a text file, so you could Search it.

general grep is a way to annotate a set of longtexts versus the searchedClasses. It outputs an ace file that you can then hand check and read back in acedb to create XREF from longTexts to genes etc.
TEXT;
$txt = <<< TEXT
A simple tool to store and display texts longer than a few lines.

The search button will highlight all teeeeehe words matching the name of objects that are members of the classes listed in searchedClasses, itself a member of the KeySet class. The highlighted words are hypertext.

Edit invokes wscripts/acedb.editor, which by default launches emacs. Edit that file to start another editor in its place.

Save will recover from the emacs but will not destroy it.

Read will read a text file, so you could Search it.

general grep is a way to annotate a set of longtexts versus the searchedClasses. It outputs an ace file that you can then hand check and read back in acedb to create XREF from longTexts to genes etc.
TEXT;
      $t = new Textsentenseparser();
      $t->parsesim($txt,$txt2);
      
   ###

?>
Logged

No links in signatures please
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #1 on: September 06, 2009, 05:59:49 PM »

So,

Mr. Jones is a nice man. However, blah blah blah

would be split into two sentances?



Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
streetparade
n00b
*
Offline Offline

Posts: 2


View Profile
« Reply #2 on: September 08, 2009, 10:47:50 AM »

Hello Thanks for reply
Hmm... That was a good question.
The script ist inteligent enoug to not split it in two sentences.
However this is for my case not effectiv.
I will implement another better thing its an algorithmus the Boyer-Moore-Algorithmus.
I will implement it. The Algorithmus a month.
The Alg is more effectiver than the similar_text.
Logged

No links in signatures please
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: September 08, 2009, 11:40:04 AM »

I'm a little confused... if all you want is to break text into sentences, then couldn't you do the same with:

$arr explode('. 'str_replace(array(chr(10), chr(13)), ' '$inputText));
foreach(
$arr as $sentence$final[] = trim($sentence);

...?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!