
![]() |
nutballs
So i have been rethinking a problem that I tackled a while ago, but now am wondering if i have just been missing something.
here is my table row structure id | phrase | somemagicdata simple right? examples of the phrases might be car tug boat lion white tiger now, the problem is that I want to match those phrases to an existing phrase I have, for example: pretty white tiger in vegas should match the "white tiger" entry in the database. now if it was the otherway around, i would just use a LIKE query. but how do you do a "backwards like" query? so in otherwords, normal LIKE queries, the phrase you supply can partially match the Longer phrase in the database. But i want the longer phrase to be what I supply to the query, and the database contains the partials. The way I have been doing this is by splitting the phrase I supply into single words, and doing straight up word matching against the database. The problem with this is that it is very limiting. so i have been doing: select * from table where phrase='pretty' or phrase='white' or phrase='tiger' or phrase='vegas' I wish i could do this: select * from table where phrase interiorlike 'pretty white tiger vegas' make sense? hwo the hell do i do that? or is the dumb way i been doing it, the way its done? perkiset
Nice puzzle... is this related to anything you and I have been talking about? If so, perhaps a database is the wrong plan - if you have a hash list or associative array where the names are the keywords you are looking for, then you could parse the input phrase and say something like:
( Assume $saladArr['car'] = true, $saladArr['whitetiger'] = true etc ) $words = explode(' ', $inputSentence); foreach($words as $thisWord) { if ($saladArr[$thisWord]) { // BANG! Gotcha... } } ... perhaps this spins a gear or so...? You could grab all the "primary words" from the database once a night, create an array and serialize it so that it loads and runs quickly... or cache it or something... this of course depends on the salad not being too big... hydra
There's a MYSQL comparison operator (WHERE
REGEXP)perkiset
wait... speedwise you just kicked my ASS on
regexes in that other thread... please don't tell me you use them here to go faster than me and array references or I'm just gonna shoot myself!![]() hydra
No shooting yourself just yet
![]() Haven't tested this method (speed) <edit>or functionality - mysql5?</edit> but sounds like it could work. Let me know? nutballs
yes perk its related to what you think, but only partly. unfortunatly the wordsalad into an array wont work, since it is actually going to end up being 1000's of comparison phrases.
regex. hmm. i was thinking at first thatregexwouldnt work because after all you would be usingregexto find matching results from the database that match a pattern. in the end that would be similar to me doing a bunch of ORs.the problem comes in with multi word matching of phrases. I may have the phrase "white tiger" in my database, but want it to match on "white tiger vegas" "vegas tiger white" "where are the white tigerS" I guess a better way to put it as SQL would be this: select * from table where phrase KindaLike 'White Tigers in Las Vegas' and it would bring up the result in the database that has the phrase 'White Tiger' im wondering if full text search in MSsql might work for this, Kaptain? its outside the normal intended use of FTS but maybe this is the route? nutballs
hehe cool
full text search does it. even handles plurals. so for example, i have "white tiger" in my database and throw "which hotel has the white tigers in vegas" it will match. now the question is if there is any way to handle misspellings with that. gonna read a bit, but so far it doesnt seem like it. perkiset
Here I can help... you can do what I mentioned for the quick-indexed "partials" - use a
typogenerator for each word in the blob. Until you're in the millions and millions of words, the extra index entries will be trivial.nutballs
oh ic. if there is a word I identify then run the
typos, string them together in one entry in the database basically.the problem is, i dont know what those words are until its too late ![]() So aside from that the fulltextsearch is supposed to return a rank, but no matter what i try, everything returns with rank of 0. So now i gotta figure that out... lol making progress though. i think this will definatly be the direction i should go. perkiset
quote author=nutballs link=topic=278.msg1859#msg1859 date=1180500123 oh ic. if there is a word I identify then run the typos, string them together in one entry in the database basically.the problem is, i dont know what those words are until its too late ![]() by "some" I assume you mean a project where you knew the words in advance ![]() typos against it and post that back into the blob and you;ve got it. I don't quite understand how the remains of that sentence works within the context of your earlier posts in this thread though...nutballs
in general i dont really know what the hell people are coming in on.
The phrases are all over the place, and i want to direct them to the most appropriate destination. obviously not gonna be perfect but this seems to be somewhat closer. if i manually set up each phrase that I get traffic for, i would never be able to leave the house, and probably would never catch up. so this seems to get me most of the way there. typos might not even be worth my effort. since that would end up being basically the entire dictionary, misspelled...bah, im multitasking like 5 projects, so i probably am not really thinking well. i gotta figure it out. perkiset
Ping as needed!
Perk, out m0nkeymafia
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Itll match relevant text in your query to anything in the database. You can also order it by relevancy so if you have 2 phrases tiger, and white tiger it'll match white tiger because its more relevant. Pretty quick too [dont know stats hydra ![]() nutballs
yea MM thats what i using, MSsql version though, same principle. the problem is that since currently my result set is so small it doesnt return a rank, because the results are "too similar". bah. At least this is capable of doing what I want, just not perfectly. I think i can get it perfect though... i just gotta pen-and-paper it a few times to see.
thanks for the help guys. |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads