|
koro
|
 |
« on: November 17, 2009, 07:53:17 PM » |
|
Hi
I made simple harvester that puts keywords to db. When i reached 200K, system started to slow down.
How to design this db to make it faster? Im using UNIQUE on 'Keyword' field so i wont have doubles in list.
Im aiming in few millions of keywords to be harvested. Please help me with this.
|
|
|
|
|
Logged
|
No links in signatures please
|
|
|
|
nutballs
|
 |
« Reply #1 on: November 17, 2009, 08:03:01 PM » |
|
Hmmmm that's weird. I have a sentence database with millions of rows, fulltext, unique. And it hauls ass on insert ignore. So my gut says something is wrong. Try nuking the index and recreating it? I assume it's varchar and not Text. Is anything being done prior to the insert? Like are you processing the phrase first? Because then the problem might be there? In otherwords are you positive it's the insert being slow?
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
koro
|
 |
« Reply #2 on: November 17, 2009, 08:18:02 PM » |
|
I was using ON DUPLICATE UPDATE to avoid mysql error. I'll try INSERT IGNORE
|
|
|
|
|
Logged
|
No links in signatures please
|
|
|
|
nutballs
|
 |
« Reply #3 on: November 17, 2009, 09:16:51 PM » |
|
ah. i wonder if that will fix it? If so, go to know not to use that. lol
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
koro
|
 |
« Reply #4 on: November 17, 2009, 09:31:32 PM » |
|
Thank you.I tried INSERT IGNORE, and aditionally i removed NOW() from inserting keywords. Which seemd to speed things up.
But i think this is till slow. Starting on empty base, i got about 0.25 second for 20 runs. Now im at 100K keywords and script slows down to 2.45 second for 20 runs, and gradualy slows down.
Curl or bandwidth ,if think ,is not bottleneck becouse it run fast on first few thousends keywords.
By the time i'd hit million it would take i guess 30-40 seconds to run one batch of queries so i'm doing something wrong.
I attach the script i use. Please give it a try.
It uses VertrigoServ, a WAMP stack. You can use any other ofcourse. The config file for db is in dbleg.php
|
|
|
« Last Edit: November 17, 2009, 09:47:25 PM by koro »
|
Logged
|
No links in signatures please
|
|
|
|
lamontagne
|
 |
« Reply #5 on: November 17, 2009, 09:48:46 PM » |
|
If u could post your MySQL configuration file and table structure code I can help u.
|
|
|
|
|
Logged
|
"Long time no see. I only pray the caliber of your questions has improved." - Kevin Smith
|
|
|
|
koro
|
 |
« Reply #6 on: November 17, 2009, 10:03:24 PM » |
|
If u could post your MySQL configuration file and table structure code I can help u.
Thank you. Herese the my.ini file: [mysqld] basedir=E:\Program Files\VertrigoServ\Mysql\ datadir=E:\Program Files\VertrigoServ\Mysql\data\ port =3306 key_buffer =16M max_allowed_packet =1M table_cache =32 sort_buffer_size =512K net_buffer_length =8K read_buffer_size =256K read_rnd_buffer_size =512K myisam_sort_buffer_size =8M
[mysqldump] quick max_allowed_packet =16M
[mysql] no-auto-rehash # Remove the next comment character if you are not familiar with SQL #safe-updates
[isamchk] key_buffer =20M sort_buffer_size =20M read_buffer =2M write_buffer =2M
[myisamchk] key_buffer =20M sort_buffer_size =20M read_buffer =2M write_buffer =2M
[mysqlhotcopy] interactive-timeout
The table as i use it is: CREATE TABLE IF NOT EXISTS `keywords` ( `KeywordId` int(10) NOT NULL auto_increment, `Keyword` char(100) NOT NULL, `Exploded` tinyint(1) NOT NULL, `Date` datetime NOT NULL, `Category` char(60) NOT NULL, PRIMARY KEY (`KeywordId`), UNIQUE KEY `Keyword` (`Keyword`) )
|
|
|
|
« Last Edit: November 17, 2009, 10:05:48 PM by koro »
|
Logged
|
No links in signatures please
|
|
|
|
lamontagne
|
 |
« Reply #7 on: November 18, 2009, 08:51:01 AM » |
|
is it an innodb or myisam database?
|
|
|
|
|
Logged
|
"Long time no see. I only pray the caliber of your questions has improved." - Kevin Smith
|
|
|
|
lamontagne
|
 |
« Reply #8 on: November 18, 2009, 09:18:20 AM » |
|
Here's a quick edit to the my.ini ... give me server specs when you get a chance so I know how much memory, cpu i'm working with here...but try this one with your app (make sure to create a backup of the existing one first) ... [mysqld] basedir=E:\Program Files\VertrigoServ\Mysql\ datadir=E:\Program Files\VertrigoServ\Mysql\data\ port =3306 key_buffer_size =50M max_allowed_packet =1M table_cache =32 sort_buffer_size =512K net_buffer_length =8K read_buffer_size =256K read_rnd_buffer_size =512K myisam_sort_buffer_size =5M query_cache_size =30M thread_cache=40
[mysqldump] quick max_allowed_packet =16M
[mysql] no-auto-rehash # Remove the next comment character if you are not familiar with SQL #safe-updates
[isamchk] key_buffer =20M sort_buffer_size =20M read_buffer =5M write_buffer =5M
[myisamchk] key_buffer =20M sort_buffer_size =20M read_buffer =2M write_buffer =2M
[mysqlhotcopy] interactive-timeout
|
|
|
|
|
Logged
|
"Long time no see. I only pray the caliber of your questions has improved." - Kevin Smith
|
|
|
|
koro
|
 |
« Reply #9 on: November 18, 2009, 09:50:30 AM » |
|
is it an innodb or myisam database?
myisam Im using 2.66Ghz server with 1.66 RAM I'll apply the changes to my.ini and i'll check.
|
|
|
|
|
Logged
|
No links in signatures please
|
|
|
|
perkiset
|
 |
« Reply #10 on: November 18, 2009, 10:48:52 AM » |
|
I'd like to see the structure of the tables and indicies as well. Do you have any triggers attached to the table(s)?
|
|
|
|
|
Logged
|
It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
|
|
|
|
lamontagne
|
 |
« Reply #11 on: November 18, 2009, 10:58:36 AM » |
|
Also, when you do the inserts are you doing them one at a time or in one query... i kinda assumed you were already but double checking, this is the most obvious method to speed it up and can make it much faster on its own,... see http://www.petefreitag.com/item/379.cfm if you want an example... if this wont work for your situation (the keywords need to be inserted real time and can't wait for batches) then you wanna optimize the db as much as possible... well actually you should probably do that anyway... also a note on the batch stuff, a time based batch inserts seem to work best for me... aka grab everything i need to insert and store in memory... every 5 minutes do a batch insert of everything in memory, start over...
|
|
|
|
|
Logged
|
"Long time no see. I only pray the caliber of your questions has improved." - Kevin Smith
|
|
|
|
perkiset
|
 |
« Reply #12 on: November 18, 2009, 11:44:11 AM » |
|
 ... exactly ... issue multiqueries rather than individuals and you'll be way happier.
|
|
|
|
|
Logged
|
It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
|
|
|
|
nutballs
|
 |
« Reply #13 on: November 18, 2009, 12:20:28 PM » |
|
but even so, aside from the multi queries, i get the sense something is wrong.
I jam hundreds a minute into my table, and im at about 20million rows, all unique.
could the fact that the keyword field is CHAR instead of VARCHAR have anything to do with it? I obviously doubt it, but something is obviously wacky (at least from my pov).
|
|
|
|
|
Logged
|
I could eat a bowl of Alphabet Soup and shit a better argument than that.
|
|
|
|
vsloathe
|
 |
« Reply #14 on: November 18, 2009, 01:21:50 PM » |
|
You need to create an index on your key field. MySQL will thank you.
|
|
|
|
|
Logged
|
hai
|
|
|
|