nutballs

So i have been rethinking a problem that I tackled a while ago, but now am wondering if i have just been missing something.

here is my table row structure
id | phrase | somemagicdata

simple right?

examples of the phrases might be
car
tug boat
lion
white tiger

now, the problem is that I want to match those phrases to an existing phrase I have, for example:
pretty white tiger in vegas
should match the "white tiger" entry in the database.

now if it was the otherway around, i would just use a LIKE query. but how do you do a "backwards like" query?
so in otherwords, normal LIKE queries, the phrase you supply can partially match the Longer phrase in the database. But i want the longer phrase to be what I supply to the query, and the database contains the partials.

The way I have been doing this is by splitting the phrase I supply into single words, and doing straight up word matching against the database. The problem with this is that it is very limiting.

so i have been doing:
select * from table where phrase='pretty' or phrase='white' or phrase='tiger' or phrase='vegas'

I wish i could do this:
select * from table where phrase interiorlike 'pretty white tiger vegas'


make sense? hwo the hell do i do that? or is the dumb way i been doing it, the way its done?

perkiset

Nice puzzle... is this related to anything you and I have been talking about? If so, perhaps a database is the wrong plan - if you have a hash list or associative array where the names are the keywords you are looking for, then you could parse the input phrase and say something like:

( Assume $saladArr['car'] = true, $saladArr['whitetiger'] = true etc )

$words = explode(' ', $inputSentence);
foreach($words as $thisWord)
{
  if ($saladArr[$thisWord]) { // BANG! Gotcha... }
}

... perhaps this spins a gear or so...? You could grab all the "primary words" from the database once a night, create an array and serialize it so that it loads and runs quickly... or cache it or something... this of course depends on the salad not being too big...

hydra

There's a MYSQL comparison operator (WHERE

REGEX

 P)

perkiset

wait... speedwise you just kicked my ASS on

regex

 es in that other thread... please don't tell me you use them here to go faster than me and array references or I'm just gonna shoot myself!  Applause

hydra

No shooting yourself  just yet Applause

Haven't tested this method (speed) <edit>or functionality - mysql5?</edit> but sounds like it could work.

Let me know?

nutballs

yes perk its related to what you think, but only partly. unfortunatly the wordsalad into an array wont work, since it is actually going to end up being 1000's of comparison phrases.

regex

 . hmm. i was thinking at first that

regex

  wouldnt work because after all you would be using

regex

  to find matching results from the database that match a pattern. in the end that would be similar to me doing a bunch of ORs.

the problem comes in with multi word matching of phrases.
I may have the phrase "white tiger" in my database, but want it to match on "white tiger vegas" "vegas tiger white" "where are the white tigerS"
I guess a better way to put it as SQL would be this:
select * from table where phrase KindaLike 'White Tigers in Las Vegas'
and it would bring up the result in the database that has the phrase 'White Tiger'

im wondering if full text search in MSsql might work for this, Kaptain?
its outside the normal intended use of FTS but maybe this is the route?

nutballs

hehe cool

full text search does it. even handles plurals.

so for example, i have "white tiger" in my database and throw "which hotel has the white tigers in vegas" it will match.

now the question is if there is any way to handle misspellings with that. gonna read a bit, but so far it doesnt seem like it.

perkiset

Here I can help... you can do what I mentioned for the quick-indexed "partials" - use a
typo
  generator for each word in the blob. Until you're in the millions and millions of words, the extra index entries will be trivial.

nutballs

oh ic. if there is a word I identify then run the
typo
 s, string them together in one entry in the database basically.

the problem is, i dont know what those words are until its too late Applause that would work for some, but not many.

So aside from that the fulltextsearch is supposed to return a rank, but no matter what i try, everything returns with rank of 0. So now i gotta figure that out... lol

making progress though. i think this will definatly be the direction i should go.

perkiset

quote author=nutballs link=topic=278.msg1859#msg1859 date=1180500123

oh ic. if there is a word I identify then run the
typo
 s, string them together in one entry in the database basically.

the problem is, i dont know what those words are until its too late Applause that would work for some, but not many.


by "some" I assume you mean a project where you knew the words in advance  Applause in which case it is trivial... parse the list into single words, run
typo
 s against it and post that back into the blob and you;ve got it. I don't quite understand how the remains of that sentence works within the context of your earlier posts in this thread though...

nutballs

in general i dont really know what the hell people are coming in on.
The phrases are all over the place, and i want to direct them to the most appropriate destination. obviously not gonna be perfect but this seems to be somewhat closer.

if i manually set up each phrase that I get traffic for, i would never be able to leave the house, and probably would never catch up.

so this seems to get me most of the way there.
typo
 s might not even be worth my effort. since that would end up being basically the entire dictionary, misspelled...

bah, im multitasking like 5 projects, so i probably am not really thinking well. i gotta figure it out.

perkiset

Ping as needed!

Perk, out

m0nkeymafia

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Itll match relevant text in your query to anything in the database.
You can also order it by relevancy so if you have 2 phrases
tiger, and white tiger

it'll match white tiger because its more relevant.
Pretty quick too [dont know stats hydra Applause]

nutballs

yea MM thats what i using, MSsql version though, same principle. the problem is that since currently my result set is so small it doesnt return a rank, because the results are "too similar". bah. At least this is capable of doing what I want, just not perfectly. I think i can get it perfect though... i just gotta pen-and-paper it a few times to see.

thanks for the help guys.


Perkiset's Place Home   Politics @ Perkiset's