The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 09:31:00 PM

Login with username, password and session length


Pages: [1] 2 3
  Print  
Author Topic: taming an enormous database  (Read 13744 times)
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« on: November 02, 2007, 10:35:53 PM »

i've now ran into my first major database roadblock D'oh! how to optimize mysql databases for very huge databases.

i've got a table with 18.8mil rows of user ids. i plan to check these user ids to see if they're online or not, and will be doing this from several servers and thus need a php script to send each server a few names to check. i suppose before i explain the script and the bottleneck i should give the structure:

Code:
CREATE TABLE `userids` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `userid` varchar(16) NOT NULL,
  `last_check_date` datetime NOT NULL,
  `sent_date` datetime NOT NULL,
  `offline_count` tinyint(4) NOT NULL COMMENT 'number of times sn was offline',
  `status` varchar(1) NOT NULL default '0' COMMENT '0 = ready, 1 = sent for checking, 2 = checked, 3 = inactive',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `sn` (`sn`),
  KEY `last_check_date` (`last_check_date`,`sent_date`,`offline_count`,`status`),
  FULLTEXT KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=18806132 ;

to sum this up, last_check_date is the date the user's online status was last checked, and sent_time was the time that userid was sent to one of my servers. sent_time is used in combination with status (1) for pruning in the event that my server doesn't send back the online status for whatever reason. and this causes the problem. i ran my php script to send 5 user ids to my other server, so the first 5 rows in this database had status values 1 and had a sent_date set. now my prune time is set to 5 minutes, so it should reset user ids that haven't gotten a response in 5 minutes back to 0. below is the sql:

Code:
UPDATE `userids` SET status=0 WHERE status LIKE '%1%' AND sent_date <= '2007-11-03 01:34:05'

i've just run this command and the result: Affected rows: 5 (Query took 59.0590 sec) Vomit

for a php script that is to be ran every 5 minutes, this isn't pretty.

i know there's some sql gurus around here, can anyone throw me a few bones about how to optimize this db and/or the query? i think it's the sent_date causing things to slow down a ton, but i'm not 100% sure. any tips are appreciated  Smiley
Logged
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #1 on: November 03, 2007, 02:17:30 AM »

Something like;

Code:
EXPLAIN SELECT status FROM `usersids` WHERE status LIKE '%1%' AND sent_date <= '2007-11-03 01:34:05'

Might be a good place to start, post the results here to get the discussion rolling.

HTH,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #2 on: November 03, 2007, 02:25:30 AM »

Holy shit, just noticed status is a varchar(1)...... wtf for?

Change it to a tinyint(1) and use status = 1 and I would be surprised if you didn't see an immediate and considerable improvement in query performance.

Always favour queries based on numeric data types if you can.

You may also want to look at the INNODB engine as opposed to MyISAM. Test, test, test.

Post the results of the explain here anyway and don't assume I am the be all and end all of datbase optimizers  Wink

My 2c,
td

[edit]Sorry about the double post[/edit]
« Last Edit: November 03, 2007, 02:32:12 AM by thedarkness » Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: November 03, 2007, 08:45:39 AM »

Also, the status like '%1%' is just a huge no-no - you essentially killed any index that could have helped you.

First, TD is right - either make that a tiny int or a CHAR 1. Then build an index on status plus the id (PK). This will make the cardinality of the index perfect.

Then do your update as where status = '1' or = 1 and it will rock.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: November 03, 2007, 10:15:42 AM »

agreed with the others:
change status to tinyint
get rid of the like: WHERE status=1

that will eliminate the fulltext index on status
you then also dont need to key all those things since both the offlinecount and status are int. (mysql auto indexes numeric types right? MSSQL does, but I cant remember if mySQL.) I think the dates are also autoindexed since they are just numeric anyway technically. so I don't think you need to actually key/index those columns at all. Unless mySQL doesnt autoindex, then nevermind   
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: November 03, 2007, 10:57:09 AM »

MySQL does not autoindex anything as far as I know, you must explicitly create them. And actually it's not a fulltext index that is created but rather a regex of sorts that is applied to every row in the table for evaluation, which is why it's taking so freaking long. By using the status+id index it will be trivially fast.

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #6 on: November 03, 2007, 01:16:45 PM »

late night (early morning?) coding will do this to you D'oh!

i've changed the type, and the indexes and speed seems to have gone up a little. i tested the db with 125,000 rows set to be pruned, the result:

Code:
Affected rows: 125025 (Query took 61.8150 sec)

i've refilled the db with the 125,000 test rows and here's the EXPLAIN results:

Quote
EXPLAIN SELECT STATUS FROM `userids`
WHERE STATUS =1
AND sent_date <= '2007-11-03 01:34:05'

id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    userids    index    NULL    last_check_date    18    NULL    18806131    Using where; Using index
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #7 on: November 03, 2007, 01:34:19 PM »

MySQL does not autoindex anything as far as I know, you must explicitly create them.
ah ok, good to know. lol
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #8 on: November 03, 2007, 02:11:21 PM »

... it took 61 seconds to do 125K updates? That still seems a bit high... that's only about 2K/sec... maybe I'm just delierous from doing the bills.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #9 on: November 03, 2007, 03:16:04 PM »

yeah, that's correct perk Need Help i decided to try again with more rows, 170k this time. i don't expect to have to prune many rows, but i'd like to be prepared for anything.

i can select the rows very quickly, but updating kills it. for some reason phpmyadmin only says it takes under a second to complete, but my browser certainly takes ~10 seconds to complete so i think phpmyadmin is lieing  Don't make me...

Code:
SELECT id FROM `userids` WHERE STATUS =1 AND sent_date <= '2007-11-03 17:58:44'
Showing rows 0 - 29 (170,750 total, Query took 0.0021 sec)

and now the update..

Code:
UPDATE `userids` SET status=0 WHERE status=1 AND sent_date <= '2007-11-03 17:58:44'
Affected rows: 170750 (Query took 59.5929 sec)

i do not plan to add any more rows to this database, and i plan to move the ones that were found to be online to another table to hopefully speed things up. i think the next test would be going to INNODB.

i can't imagine how slow it would take to data mine 200mil records of user information  maybe it's time i look into cubes
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #10 on: November 03, 2007, 03:38:35 PM »

Too bad the DBDog has not been around... he'd have a suggestion. Sorry man, I'm about fresh out of ideas on that one.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #11 on: November 03, 2007, 04:56:39 PM »

Code:
id    select_type    table    type    possible_keys    key    key_len    ref    rows    Extra
1    SIMPLE    userids    index    NULL    last_check_date    18    NULL    18806131    Using where; Using index

Why is last_check_date even mentioned here?

Why is possible_keys NULL?

Why is ref NULL?

This has exceeded my knowledge but something looks fishy here, to me at least.

Indica, try creating an index on sent_date

Cheers,
td

[edit] I really need to learn more about explain. Let's start a thread purely for information on interpretting what explain is trying to tell you? [/edit]
« Last Edit: November 03, 2007, 05:00:06 PM by thedarkness » Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #12 on: November 03, 2007, 06:51:46 PM »

Sorry for being a serial double poster in this thread :-D

Indica: Try the select with "EXPLAIN EXTENDED" prepended and post the results here, I'm really interested in this so take advantage while I'm "hot"  ROFLMAO

Dangers of a short attention span  ROFLMAO

Cheers,
td

[edit]
After you issue the "EXPLAIN EXTENDED SELECT....." statement you need to issue a "SHOW WARNINGS" command to get the output?

From; http://dev.mysql.com/doc/refman/5.1/en/explain.html

"When the EXTENDED keyword is used, EXPLAIN produces extra information that can be viewed by issuing a SHOW WARNINGS  statement following the EXPLAIN statement. This information displays how the optimizer qualifies table and column names in the SELECT statement, what the SELECT looks like after the application of rewriting and optimization rules, and possibly other notes about the optimization process. EXPLAIN EXTENDED also displays the filtered  column as of MySQL 5.1.12."

you also may need to issue a "ANALYZE TABLE" to update cardinality, etc. if you make changes.

Interestingly the possible_keys section states;
" If this column is NULL, there are no relevant indexes. In this case, you may be able to improve the performance of your query by examining the WHERE clause to check whether it refers to some column or columns that would be suitable for indexing. If so, create an appropriate index and check the query with EXPLAIN again. See Section 12.1.4, “ALTER TABLE Syntax”.

To see what indexes a table has, use SHOW INDEX FROM tbl_name. "

Which would tend to indicate that your query can not find a suitable index to use and this I would say is critical........

HTH
[/edit]
« Last Edit: November 03, 2007, 07:07:22 PM by thedarkness » Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #13 on: November 03, 2007, 06:58:24 PM »

haha, i kill me  ROFLMAO

Also read, http://dev.mysql.com/tech-resources/articles/performance-partitioning.html and http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitioning.html

Looks like partitioning may present a possible (substantial) performance increase if we can work out a good way to lay it out.

Post the "EXPLAIN EXTENDED" output first and let's get that as optimized as possible before we go too far down the partition road.

Told you I was "hot"  ROFLMAO

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #14 on: November 04, 2007, 02:32:17 PM »

Why is last_check_date even mentioned here?
that's the name of the index, phpmyadmin did this

Why is possible_keys NULL?

Why is ref NULL?
good questions and i'm not sure. never had to tweak a db like this before, most behave the way they should

sent_date has an index, it's under last_check_date with last_check_date, sent_date, offline_count, and status. when i created the table i think phpmyadmin clumped them all together, if they're supposed to be like this or on their own i'm not sure.

here's explain extended:
Code:
EXPLAIN EXTENDED SELECT STATUS FROM `userids`
WHERE STATUS =1
AND sent_date <= '2007-11-03 01:34:05'

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE userids index NULL last_check_date 18 NULL 18806131 Using where; Using index

SHOW WARNINGS showed nothing. so i then did the analyze

Code:
Table Op Msg_type Msg_text
oldids.userids analyze status OK

and show index:
Code:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
userids 0 PRIMARY 1 id A 18806131 NULL NULL   BTREE
userids 0 sn 1 sn A 18806131 NULL NULL   BTREE
userids 1 last_check_date 1 last_check_date A 1 NULL NULL   BTREE
userids 1 last_check_date 2 sent_date A 934 NULL NULL   BTREE
userids 1 last_check_date 3 offline_count A 934 NULL NULL   BTREE
userids 1 last_check_date 4 status A 934 NULL NULL   BTREE
userids 1 id 1 id A 18806131 NULL NULL   BTREE
userids 1 id 2 status A 18806131 NULL NULL   BTREE

there's already an index on both status and sent_date?  Huh?

as for partitioning, yeah i've considered this. i was going to make a table for the a usernames, b usernames, etc. i would prefer not to as i'm going to use a similar system for another project, but if it's the way it has to be i guess that's it. would be nice not to have to reimport this entire thing, damn db took 4 days to import as it is

if none of this improves speed, i'll try switching to innodb. and if that is still shit, ill try partitioning. thanks for the help thus far  Praise
Logged
Pages: [1] 2 3
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!