deregular

Dont know why the hell this isnt working??

Seems if i create an array manually, it spits out clean results...
If I pull the keywords from mysql, it leaves the badwords in..

BTW, no there are no funny characters in the keyword db, and have done a print_r on the entire array.. all looks fine.?

Anyone got any ideas??


function grabandcleankeywords($keyword){
   
    //GET KEYWORDS FROM DB AND PUT IN ARRAY LIST
    $sql=mysql_query("SELECT keyword FROM keywords WHERE keyword LIKE '%$keyword%'");
  while($rows=mysql_fetch_array($sql)){
      $list[]=trim($rows['keyword']);
    }
   
    //GET RID OF DUPLICATES
    $list = array_values(array_unique($list));
   
    //GET NAUGHTY WORDS AND PUT IN ARRAY NAUGHTYLIST
    $lines=file("data/naughty.txt");
   
    foreach($lines AS $line_num => $line){
        $naughtylist[]=trim($line);
    }
   
    //COUNT KEYWORD ARRAY
    $keywordCount=count($list);

    for($k=0;$k<$keywordCount;$k++){
      if(in_array($list[$k],$naughtylist)){
          unset($list[$k]);
      }
    }
   
    return $list;
   
}



.. and obviously im calling it like this..


$keyword='sex';
$wordlist=grabandcleankeywords($keyword);

foreach($wordlist AS $word){
    echo "{$word}<br>";
}

perkiset

You say "when you put them in manually..." - I assume you are saying you replace the MySQL code that gets the from a DB - and then it works.

Well, I'd put a print_r($list) right after your keyword acquisition and compare it against your manually generated list - something is not copisetic. I'm curious if you are including your de-duper in the acquisition phase or after it...

Purely from a speed perspective, you might think about simply saying "select distinct keyword from..." so that you only get one instance of the keywords to start with.

But if I may, you might consider a different approach which might be faster.

First: load the naughtywords one time and use a global to access them. Or, and more to my liking, create a class that opens the naughtywords on creation and use that.
Next: Create the $list array while looking at the naughty words, rather than doing it in several passes. I'm just going to go quickly, but it might look something like this:

(BTW: I don't know why in the world you're doing the LIKE operator, because you just beat the hell out of  your efficiency that way... if there's any way at all that you can do an '=' instead you'll get magnitudes of speed improvement - well, based on the structure of the DB anyway)


<?

php

 
$tempArr = explode(chr(10), file_get_contents('./naughty.txt'));
foreach($tempArr as $word)
$GLOBALS['naughtyWords'][$word] = true;

function cleanList($keyword)
{
$set = mysql_query("select distinct keyword from keywords where keyword like '%keyword%'");
while ($row = mysql_fetch_row($set))
{
if (!$GLOBALS['naughtyWords'][$row[0]])
$outList[] = $row[0];
}
return $outList;
}
?>


Note here that I am also taking advantage of a trick in

PHP

  - rather than using the inarray function, which will walk the whole array trying to find your answer, I've built an array where the keywords you're using are the KEY in the array - now they are hashed and wicked fast. This entire mechanism can be summed up in this sentence:

"Build the outList array from a database keywords, provided there is NOT an entry in the naughtyWords global array with a key of (the naughtyword)."

Hope this all helps,
/p

deregular

Thanks for the reply perk. As you can tell, this had me perplexed as to why it wasnt working.

quote
You say "when you put them in manually..." - I assume you are saying you replace the MySQL code that gets the from a DB - and then it works.

Yeah, I meant that I commented out the mysql call, and the creation of the array from it, and simply replaced with a shorter array written down in the source.. eg. $list=array('dog','cat','monkey','sex')
And the nasty words dont get displayed, but when I use the mysql call, it doesnt work. Really weird.

Anywho, Ill take on your suggestions perk, I may as well look at efficiency while Im here. Always

learn

 ing !!

Will let you guys know how i go and will post the script, someone may find a use for it.

cheers
d

deregular

ok now i have...



function grabandcleankeywords($keyword){

    $tempArr = explode(chr(10), file_get_contents('./data/naughty.txt'));
    foreach($tempArr as $word){
    $GLOBALS['naughtyWords'][$word] = true;
    }

    $set = mysql_query("select distinct keyword from keywords where keyword like '%$keyword%'");
  while ($row = mysql_fetch_row($set)){
     
    if (!$GLOBALS['naughtyWords'][$row[0]]){          //THIS IS THE LINE WHERE THE ERROR OCCURS
    $outList[] = $row[0];
    }
  }
  return $outList;
 
}



but i am getting an undefined index error..
Notice: Undefined index: sex education in d:program fileseasy

php

 1-8wwwhccfunctions.

php

  on line 32

Any ideas?

leadegr00t

I would try:
if (isset($GLOBALS['naughtyWords'][$row[0]])
            && true === $GLOBALS['naughtyWords'][$row[0]]){


the check against true is probably redundant because if it exists at all then it will be true

code untested

deregular

Thanks for that, it got rid of the errors, but now returns a blank page.
I'll have a poke around at the function a little bit over the next few days.
Seems what I was doing was trying to match complete array elements,
rather than what I should be doing is matching any nasty word within
the strings.

Re.. instead of using in_array() or something like that I need to run through
the arrays and use stristr() instead.

Will post the result once it get it working.

gnarlyhat

deregular: Could you be kind enough to show me how your keywords tables are designed? I was at a lock knot earlier trying to figure out and I just used flatfile instead. This information might come in handy when I decide to move to DB. TIA Applause

deregular

No problems gnarly

Just as a start ive got a very basic table setup with ALL keywords in it.
So far only about 150,000 of them, but will expand it later.

One table called 'keywords'.
With x2 fields in it.
An 'id' field which is int.
And a 'keyword' field which is 'text' (could probably set this to something else for efficiency i suppose.

Later on I will be looking at having multiple keyword categories, so I will probably be looking at adding another field in there (re. category) to coincide with a categories table.

I have my naughty words as you can see, in a flatfile, but will also be moving this over to the keywords table, and setting them up in a naughty category.

So far Ive gotten this function to work.. in comparing my mysql results array with the array created by my naughty words flatfile.

Obviously, if perk or anyone wants to chime in with helping me with the efficiency of this Id be very much appreciated.
For the time being it works and Im running my control center scripts from my home pc using easy-

php

  as the environment.



function grabandcleankeywords($keyword){
 
    $tempArr=file('data/naughty.txt');

    foreach($tempArr as $word){
            $word=trim($word); //had to do this because of some weird whitespace problems
    $nasty[]=$word;
    }

    $set = mysql_query("select distinct keyword from keywords where keyword like '%$keyword%'");
      while ($row = mysql_fetch_row($set)){
            $keywords[]=$row[0];
      }
 
    $nastyCount = count($nasty);
    $keywordCount=count($keywords);

    for ($k=0; $k<$keywordCount; $k++){
    for ($l=0; $l<$nastyCount; $l++){
      if (@stristr($keywords[$k], $nasty[$l])){
          unset($keywords[$k]);
                      }
              }
    }
    return $keywords;
}



Its very similar to the code that we put together some time back over at syndk8.

Hope this helps.


Perkiset's Place Home   Politics @ Perkiset's