The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 22, 2019, 05:31:51 PM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: Bestest, Fastest, Most reliable text search (mysql)  (Read 12953 times)
KaptainKrayola
Keeper of Pie
Global Moderator
Lifer
*****
Offline Offline

Posts: 994



View Profile WWW
« on: November 25, 2009, 09:40:47 AM »

So, i have a ton of text that i need to locate specific phrases in.  I have experimented with both fulltext search and lame old LIKE but I am wondering what you guys suggest for finding specific whole phrases in text.

my fulltext query is so far almost exactly twice as fast as the LIKE query (to be expected) but figured id pick teh brains of the smart guys here to see what they suggested.

keep in mind i need to find specific phrases.  i'm not searching for relevance but for actual one or more word phrases. Cheesy

thanks yo!
Logged

We can't stop here, this is bat country.
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #1 on: November 25, 2009, 10:01:15 AM »

Bat country.. lol

MY GOD SOMEONES giving these creatures booze!

Ive done this in the past by storing the unique words
all in a table and giving them a unique id.
Then making a table of sentences with an indexed
integer field of the wordids and sequence they are in
the sentence...

WORDTABLE
--------------
wordid
wordtext

SENTENCES
-------------
sentence id
wordid
wordinsentencesequencenumber


WordTable:
1, See
2, Jane
3, Run
4, Jim

See Jane Run
SentenceTable:
1 , 1, 1
1, 2, 2
1, 3, 3

See Jim Run
SentenceTable:
2, 1, 1
2, 4, 2
2, 3, 3

.and so forth... takes a little bit of stored
procedure to get your sentences in and
 out but was a faster search through huge
bits of data....... I actually went nuts and
broke it down into characters that made words
so i could do even faster 'LIKE' queries.
Then i did the 'characters that could be mistyped'
table...   later i was gonna add a 'phenome' table
in case my creation ever 'came to life'  Grin

i went even deeper than that and stored parts
of speech data about them for switching out
adjectives and adverbs (attempt at some loebner
stuff )
Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #2 on: November 25, 2009, 11:28:53 AM »

NGrams.

Basically what Jamm is doing.
The problem is, it can become massive.

So, what I have done in the past was...
A table with single words and IDs. Easy nuff.
A table with the NGram data, but only as IDs.
My table looked like:
RootID
BeforeID
AfterID

Then you can chain them together for as long of a retarded phrase as you want.
If you need to make an entry where the Root Word is the first or last word in the ngram, just use an ID of 0.

(then use that with sentence templates... and ooooooooh aaaaaaaaah, let the shitty sentences fly!)
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
KaptainKrayola
Keeper of Pie
Global Moderator
Lifer
*****
Offline Offline

Posts: 994



View Profile WWW
« Reply #3 on: November 25, 2009, 11:42:02 AM »

hrm.

seems a little different than what i'm trying to do (then again i may just nto be following the logic totally)  i already have a big table full of text and need to find rows with phrases in them like "apple pie" and return the results with "apple pie" in them.

sounds like breaking it all up into words and sentences and such is more work than i'm looking to do for this id rather just be able to search inside the text i already have.  maybe fulltext and LIKE are really the only two viable options at this point.

Logged

We can't stop here, this is bat country.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: November 25, 2009, 02:17:54 PM »

ah ok. yep those are your choices. lol

I was thinking sentence generation.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: November 29, 2009, 03:24:11 PM »

there are a couple proprietary options, like http://www.sphinxsearch.com/ , http://lucene.apache.org/java/docs/ and of course the comment section of this http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html has interesting notions as well, including people still pointing to Senna.

But essentially Nuts and Jam are right - that's your easiest front-line options. After those, it becomes pretty complex.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
serialnoob
Journeyman
***
Offline Offline

Posts: 88


View Profile
« Reply #6 on: December 07, 2009, 10:04:20 PM »

I had my go at this as well, definitely fulltext for me

This might be other the top but, to the question what is not a relevant keyword or sentence, I have used  this

http://en.wikipedia.org/wiki/Most_common_words_in_English

More precisely, a reference to the 100 most used words on TV which are there somewhere, to screen and reduce dramatically the size of text while maintaining relevance

 
Logged

Success consists of going from failure to failure without loss of enthusiasm - Winston Churchill
ehlo
Journeyman
***
Offline Offline

Posts: 50


View Profile
« Reply #7 on: December 10, 2009, 04:00:30 PM »

I've been experimenting with Lucene lately using Solr, it is pretty impressive stuff. I'm still getting to grips with the query language but I have managed to get it to index a rather large MySQL database and return data for some simple searches. First impressions are very good and I'll bet that you can do some pretty complex searches with it too.
Logged
KaptainKrayola
Keeper of Pie
Global Moderator
Lifer
*****
Offline Offline

Posts: 994



View Profile WWW
« Reply #8 on: December 11, 2009, 01:18:41 PM »

Thanks everyone.  I got a little more familiar with some of my options for FT search in MySQL and have the particular search i needed working wonderfully.  Results are fast and accurate so i guess FullText FTW!

Smiley
Logged

We can't stop here, this is bat country.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: December 11, 2009, 02:08:08 PM »

Just a spoonful of knowledge makes the database... go round...  (Where's my Julie Andrews smiley when I need it?)
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Phaėton
Lifer
*****
Offline Offline

Posts: 555


⎝⏠⏝⏠⎠


View Profile
« Reply #10 on: February 26, 2010, 01:07:13 AM »

I've been experimenting with Lucene lately using Solr, it is pretty impressive stuff. I'm still getting to grips with the query language but I have managed to get it to index a rather large MySQL database and return data for some simple searches. First impressions are very good and I'll bet that you can do some pretty complex searches with it too.

Just went  into the time warp again looking at Lucene....
Taking it to the next level... thats really interesting...

------

oh, and @nutballs, I think someone has NGrammed the entire internet since 2006,
this is pretty awesome:

http://www.gluetext.com/about.htm

http://www.youtube.com/watch?v=X5FXj37ZKXk



Logged

When I was your age we used to walk to the TV to change the channel....  _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #11 on: February 26, 2010, 08:39:42 PM »

very interesting... i gotta play with that.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #12 on: February 27, 2010, 12:22:59 AM »

Yeah, it looks like you can actually search for an actual
combination of letters and it does a %like% sort of view
of the results...

Lets just say you wanted to crawl the entire internet
lol.

How would you create a list of all the sites that be?

 
Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
aliraza167
n00b
*
Offline Offline

Posts: 4


View Profile
« Reply #13 on: May 27, 2011, 03:22:30 PM »

There are so many search engine now a day providing there services online and most biggest search engine are Goolge, Yahoo, Bing etc and there are the best to search fastest, and reliable Was a link: Nail Polish[/color
« Last Edit: May 27, 2011, 04:02:15 PM by perkiset » Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #14 on: May 27, 2011, 04:01:48 PM »

LOL Clever  'Bot indeed.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!