The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 09:37:06 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: keyword fetching and cleaning function  (Read 3516 times)
deregular
Expert
****
Offline Offline

Posts: 172


View Profile
« on: November 07, 2007, 06:56:07 AM »

Dont know why the hell this isnt working??

Seems if i create an array manually, it spits out clean results...
If I pull the keywords from mysql, it leaves the badwords in..

BTW, no there are no funny characters in the keyword db, and have done a print_r on the entire array.. all looks fine.?

Anyone got any ideas??

Code:
function grabandcleankeywords($keyword){
   
    //GET KEYWORDS FROM DB AND PUT IN ARRAY LIST
    $sql=mysql_query("SELECT keyword FROM keywords WHERE keyword LIKE '%$keyword%'");
  while($rows=mysql_fetch_array($sql)){
      $list[]=trim($rows['keyword']);
    }
   
    //GET RID OF DUPLICATES
    $list = array_values(array_unique($list));
   
    //GET NAUGHTY WORDS AND PUT IN ARRAY NAUGHTYLIST
    $lines=file("data/naughty.txt");
   
    foreach($lines AS $line_num => $line){
        $naughtylist[]=trim($line);
    }
   
    //COUNT KEYWORD ARRAY
    $keywordCount=count($list);

    for($k=0;$k<$keywordCount;$k++){
      if(in_array($list[$k],$naughtylist)){
          unset($list[$k]);
      }
    }
   
    return $list;
   
}


.. and obviously im calling it like this..

Code:
$keyword='sex';
$wordlist=grabandcleankeywords($keyword);

foreach($wordlist AS $word){
    echo "{$word}<br>";
}
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: November 07, 2007, 11:38:16 AM »

You say "when you put them in manually..." - I assume you are saying you replace the MySQL code that gets the from a DB - and then it works.

Well, I'd put a print_r($list) right after your keyword acquisition and compare it against your manually generated list - something is not copisetic. I'm curious if you are including your de-duper in the acquisition phase or after it...

Purely from a speed perspective, you might think about simply saying "select distinct keyword from..." so that you only get one instance of the keywords to start with.

But if I may, you might consider a different approach which might be faster.

First: load the naughtywords one time and use a global to access them. Or, and more to my liking, create a class that opens the naughtywords on creation and use that.
Next: Create the $list array while looking at the naughty words, rather than doing it in several passes. I'm just going to go quickly, but it might look something like this:

(BTW: I don't know why in the world you're doing the LIKE operator, because you just beat the hell out of  your efficiency that way... if there's any way at all that you can do an '=' instead you'll get magnitudes of speed improvement - well, based on the structure of the DB anyway)

Code:
<?php
$tempArr 
explode(chr(10), file_get_contents('./naughty.txt'));
foreach(
$tempArr as $word)
$GLOBALS['naughtyWords'][$word] = true;

function 
cleanList($keyword)
{
$set mysql_query("select distinct keyword from keywords where keyword like '%keyword%'");
while ($row mysql_fetch_row($set))
{
if (!$GLOBALS['naughtyWords'][$row[0]])
$outList[] = $row[0];
}
return $outList;
}
?>


Note here that I am also taking advantage of a trick in PHP - rather than using the inarray function, which will walk the whole array trying to find your answer, I've built an array where the keywords you're using are the KEY in the array - now they are hashed and wicked fast. This entire mechanism can be summed up in this sentence:

"Build the outList array from a database keywords, provided there is NOT an entry in the naughtyWords global array with a key of (the naughtyword)."

Hope this all helps,
/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
deregular
Expert
****
Offline Offline

Posts: 172


View Profile
« Reply #2 on: November 07, 2007, 07:13:28 PM »

Thanks for the reply perk. As you can tell, this had me perplexed as to why it wasnt working.

Quote
You say "when you put them in manually..." - I assume you are saying you replace the MySQL code that gets the from a DB - and then it works.
Yeah, I meant that I commented out the mysql call, and the creation of the array from it, and simply replaced with a shorter array written down in the source.. eg. $list=array('dog','cat','monkey','sex')
And the nasty words dont get displayed, but when I use the mysql call, it doesnt work. Really weird.

Anywho, Ill take on your suggestions perk, I may as well look at efficiency while Im here. Always learning!!

Will let you guys know how i go and will post the script, someone may find a use for it.

cheers
d
Logged
deregular
Expert
****
Offline Offline

Posts: 172


View Profile
« Reply #3 on: November 07, 2007, 08:50:16 PM »

ok now i have...

Code:

function grabandcleankeywords($keyword){

    $tempArr = explode(chr(10), file_get_contents('./data/naughty.txt'));
    foreach($tempArr as $word){
     $GLOBALS['naughtyWords'][$word] = true;
    }

    $set = mysql_query("select distinct keyword from keywords where keyword like '%$keyword%'");
  while ($row = mysql_fetch_row($set)){
       
    if (!$GLOBALS['naughtyWords'][$row[0]]){          //THIS IS THE LINE WHERE THE ERROR OCCURS
     $outList[] = $row[0];
    }
  }
  return $outList;
 
}


but i am getting an undefined index error..
Notice: Undefined index: sex education in d:\program files\easyphp1-8\www\bhcc\functions.php on line 32

Any ideas?
Logged
leadegr00t
n00b
*
Offline Offline

Posts: 8



View Profile WWW
« Reply #4 on: November 08, 2007, 12:09:05 AM »

I would try:
Code:
if (isset($GLOBALS['naughtyWords'][$row[0]])
             && true === $GLOBALS['naughtyWords'][$row[0]]){

the check against true is probably redundant because if it exists at all then it will be true

code untested
Logged

No links in signatures please
deregular
Expert
****
Offline Offline

Posts: 172


View Profile
« Reply #5 on: November 08, 2007, 09:04:34 PM »

Thanks for that, it got rid of the errors, but now returns a blank page.
I'll have a poke around at the function a little bit over the next few days.
Seems what I was doing was trying to match complete array elements,
rather than what I should be doing is matching any nasty word within
the strings.

Re.. instead of using in_array() or something like that I need to run through
the arrays and use stristr() instead.

Will post the result once it get it working.
Logged
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #6 on: November 08, 2007, 10:17:03 PM »

deregular: Could you be kind enough to show me how your keywords tables are designed? I was at a lock knot earlier trying to figure out and I just used flatfile instead. This information might come in handy when I decide to move to DB. TIA Smiley
Logged
deregular
Expert
****
Offline Offline

Posts: 172


View Profile
« Reply #7 on: November 08, 2007, 10:39:59 PM »

No problems gnarly

Just as a start ive got a very basic table setup with ALL keywords in it.
So far only about 150,000 of them, but will expand it later.

One table called 'keywords'.
With x2 fields in it.
An 'id' field which is int.
And a 'keyword' field which is 'text' (could probably set this to something else for efficiency i suppose.

Later on I will be looking at having multiple keyword categories, so I will probably be looking at adding another field in there (re. category) to coincide with a categories table.

I have my naughty words as you can see, in a flatfile, but will also be moving this over to the keywords table, and setting them up in a naughty category.

So far Ive gotten this function to work.. in comparing my mysql results array with the array created by my naughty words flatfile.

Obviously, if perk or anyone wants to chime in with helping me with the efficiency of this Id be very much appreciated.
For the time being it works and Im running my control center scripts from my home pc using easy-php as the environment.

Code:

function grabandcleankeywords($keyword){
 
    $tempArr=file('data/naughty.txt');

    foreach($tempArr as $word){
             $word=trim($word); //had to do this because of some weird whitespace problems
     $nasty[]=$word;
    }

    $set = mysql_query("select distinct keyword from keywords where keyword like '%$keyword%'");
       while ($row = mysql_fetch_row($set)){
             $keywords[]=$row[0];
       }
 
    $nastyCount = count($nasty);
    $keywordCount=count($keywords);

    for ($k=0; $k<$keywordCount; $k++){
     for ($l=0; $l<$nastyCount; $l++){
      if (@stristr($keywords[$k], $nasty[$l])){
          unset($keywords[$k]);
                      }
              }
    }
    return $keywords;
}


Its very similar to the code that we put together some time back over at syndk8.

Hope this helps.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!