|
KaptainKrayola
|
 |
« on: November 25, 2009, 09:40:47 AM » |
|
So, i have a ton of text that i need to locate specific phrases in. I have experimented with both fulltext search and lame old LIKE but I am wondering what you guys suggest for finding specific whole phrases in text. my fulltext query is so far almost exactly twice as fast as the LIKE query (to be expected) but figured id pick teh brains of the smart guys here to see what they suggested. keep in mind i need to find specific phrases. i'm not searching for relevance but for actual one or more word phrases.  thanks yo!
|
|
|
|
|
Logged
|
We can't stop here, this is bat country. 
|
|
|
|
jammaster82
|
 |
« Reply #1 on: November 25, 2009, 10:01:15 AM » |
|
Bat country.. lol MY GOD SOMEONES giving these creatures booze! Ive done this in the past by storing the unique words all in a table and giving them a unique id. Then making a table of sentences with an indexed integer field of the wordids and sequence they are in the sentence... WORDTABLE -------------- wordid wordtext SENTENCES ------------- sentence id wordid wordinsentencesequencenumber WordTable: 1, See 2, Jane 3, Run 4, Jim See Jane Run SentenceTable: 1 , 1, 1 1, 2, 2 1, 3, 3 See Jim Run SentenceTable: 2, 1, 1 2, 4, 2 2, 3, 3 .and so forth... takes a little bit of stored procedure to get your sentences in and out but was a faster search through huge bits of data....... I actually went nuts and broke it down into characters that made words so i could do even faster 'LIKE' queries. Then i did the 'characters that could be mistyped' table... later i was gonna add a 'phenome' table in case my creation ever 'came to life'  i went even deeper than that and stored parts of speech data about them for switching out adjectives and adverbs (attempt at some loebner stuff )
|
|
|
|
|
Logged
|
The watched pot, never boils... But if you walk away from it , the soup burns. What gives?
|
|
|
|
nutballs
|
 |
« Reply #2 on: November 25, 2009, 11:28:53 AM » |
|
NGrams.
Basically what Jamm is doing. The problem is, it can become massive.
So, what I have done in the past was... A table with single words and IDs. Easy nuff. A table with the NGram data, but only as IDs. My table looked like: RootID BeforeID AfterID
Then you can chain them together for as long of a retarded phrase as you want. If you need to make an entry where the Root Word is the first or last word in the ngram, just use an ID of 0.
(then use that with sentence templates... and ooooooooh aaaaaaaaah, let the shitty sentences fly!)
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
KaptainKrayola
|
 |
« Reply #3 on: November 25, 2009, 11:42:02 AM » |
|
hrm.
seems a little different than what i'm trying to do (then again i may just nto be following the logic totally) i already have a big table full of text and need to find rows with phrases in them like "apple pie" and return the results with "apple pie" in them.
sounds like breaking it all up into words and sentences and such is more work than i'm looking to do for this id rather just be able to search inside the text i already have. maybe fulltext and LIKE are really the only two viable options at this point.
|
|
|
|
|
Logged
|
We can't stop here, this is bat country. 
|
|
|
|
nutballs
|
 |
« Reply #4 on: November 25, 2009, 02:17:54 PM » |
|
ah ok. yep those are your choices. lol
I was thinking sentence generation.
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
|
|
serialnoob
|
 |
« Reply #6 on: December 07, 2009, 10:04:20 PM » |
|
I had my go at this as well, definitely fulltext for me This might be other the top but, to the question what is not a relevant keyword or sentence, I have used this http://en.wikipedia.org/wiki/Most_common_words_in_EnglishMore precisely, a reference to the 100 most used words on TV which are there somewhere, to screen and reduce dramatically the size of text while maintaining relevance 
|
|
|
|
|
Logged
|
Success consists of going from failure to failure without loss of enthusiasm - Winston Churchill
|
|
|
|
ehlo
|
 |
« Reply #7 on: December 10, 2009, 04:00:30 PM » |
|
I've been experimenting with Lucene lately using Solr, it is pretty impressive stuff. I'm still getting to grips with the query language but I have managed to get it to index a rather large MySQL database and return data for some simple searches. First impressions are very good and I'll bet that you can do some pretty complex searches with it too.
|
|
|
|
|
Logged
|
|
|
|
|
KaptainKrayola
|
 |
« Reply #8 on: December 11, 2009, 01:18:41 PM » |
|
Thanks everyone. I got a little more familiar with some of my options for FT search in MySQL and have the particular search i needed working wonderfully. Results are fast and accurate so i guess FullText FTW! 
|
|
|
|
|
Logged
|
We can't stop here, this is bat country. 
|
|
|
|
perkiset
|
 |
« Reply #9 on: December 11, 2009, 02:08:08 PM » |
|
Just a spoonful of knowledge makes the database... go round...  (Where's my Julie Andrews smiley when I need it?)
|
|
|
|
|
Logged
|
It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
|
|
|
|
Phaėton
|
 |
« Reply #10 on: February 26, 2010, 01:07:13 AM » |
|
I've been experimenting with Lucene lately using Solr, it is pretty impressive stuff. I'm still getting to grips with the query language but I have managed to get it to index a rather large MySQL database and return data for some simple searches. First impressions are very good and I'll bet that you can do some pretty complex searches with it too.
Just went into the time warp again looking at Lucene.... Taking it to the next level... thats really interesting... ------ oh, and @nutballs, I think someone has NGrammed the entire internet since 2006, this is pretty awesome: http://www.gluetext.com/about.htmhttp://www.youtube.com/watch?v=X5FXj37ZKXk
|
|
|
|
|
Logged
|
When I was your age we used to walk to the TV to change the channel.... _̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡|̲̲̲͡͡͡ ̲▫̲͡ ̲̲̲͡͡π̲̲͡͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡
|
|
|
|
nutballs
|
 |
« Reply #11 on: February 26, 2010, 08:39:42 PM » |
|
very interesting... i gotta play with that.
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
jammaster82
|
 |
« Reply #12 on: February 27, 2010, 12:22:59 AM » |
|
Yeah, it looks like you can actually search for an actual combination of letters and it does a %like% sort of view of the results...
Lets just say you wanted to crawl the entire internet lol.
How would you create a list of all the sites that be?
|
|
|
|
|
Logged
|
The watched pot, never boils... But if you walk away from it , the soup burns. What gives?
|
|
|
|
aliraza167
|
 |
« Reply #13 on: May 27, 2011, 03:22:30 PM » |
|
There are so many search engine now a day providing there services online and most biggest search engine are Goolge, Yahoo, Bing etc and there are the best to search fastest, and reliable Was a link: Nail Polish[/color
|
|
|
|
« Last Edit: May 27, 2011, 04:02:15 PM by perkiset »
|
Logged
|
|
|
|
|
perkiset
|
 |
« Reply #14 on: May 27, 2011, 04:01:48 PM » |
|
LOL Clever  indeed.
|
|
|
|
|
Logged
|
It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
|
|
|
|