The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. January 08, 2009, 02:16:49 AM

Login with username, password and session length


Pages: 1 [2] 3
  Print  
Author Topic: taming an enormous database  (Read 2146 times)
thedarkness
Global Moderator
Lifer
*****
Offline Offline

Posts: 581



View Profile
« Reply #15 on: November 05, 2007, 03:13:09 AM »

Some observations and things to try (these examples are from a famous mysql test database called sakila);
Code:
CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL auto_increment,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Code:
mysql> show index from actor;
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| actor |          0 | PRIMARY             |            1 | actor_id    | A         |         200 |     NULL | NULL   |      | BTREE      |         |
| actor |          1 | idx_actor_last_name |            1 | last_name   | A         |         200 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

Code:
mysql> explain select first_name from actor where actor_id < 11 and last_update < '2006-02-15 04:34:34';
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | actor | range | PRIMARY       | PRIMARY | 2       | NULL |   10 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

These results look more like what I would expect than yours do.

Actually, just looking at this it looks to me like last_check_date is a multiple-column index using sent_date, offline_count and status, there's also one composed of id and status so I'd say that as explain was trying to tell us there was no suitable index for the query.

Try this;

create index stat on userids (status); // may use, may improve speed

create index sent on userids (sent_date); // may use, may improve speed, may use in conjunction with stat?

create index sent_stat on userids (status,sent_date); // Order important?

create index stat_sent on userids (sent_date,status); // Looks like the winner, test them all though

Now you may have to try each one uniquely (not sure here), so create one and delete the others then test with explain to see whether it's using an index and what your speed is like. If you throw all the indexes in there together it may be smart enough to pick the best one (prolly is) but it may just muddy the waters so I'd do each in isolation and then all together to see if it uses more than one etc.

Given that we are creating these indexes explicitly on the columns the queries are based on they should help a lot.

Oh and use analyze (help analyze table) and explain a lot and maybe consider optimize (help optimize table) if you suspect you have "holes" in your data.

Report back dude.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Global Moderator
Lifer
*****
Offline Offline

Posts: 581



View Profile
« Reply #16 on: November 09, 2007, 06:08:59 AM »

Don't leave us hanging Indica, what's the scoop?

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #17 on: November 09, 2007, 06:01:22 PM »

i'm going to work on this database some more this weekend. very busy week, i haven't forgotten.

i got a feeling i will have to partition it, i'd rather not do this because of future projects that will be on a similar scale. they won't partition well due to the nature of the data. and if i can't get a 18mil record db going smooth(er), i don't want to think of having 200mil records with gbs of data  D'oh!
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 1142


View Profile
« Reply #18 on: November 09, 2007, 06:02:56 PM »

I had a similar problem when i made a program that handled newspaper subscribers there was like 100K of them.
Subscribers could stop and start daily
The program had to run on a 486.
The problem is the updates. That is what killed me.

The db schema for each user was like this.
route number,subscriber id, name etc
and i was all in one table.
there where approx 500 routes (i can not remember details).

i then made like 500 tables with same scheme as above but with each table only having users of same route.

I made a class that would handle all the housekeeping (proggie was in python and this was before sqlobject was invented).
that way when new user was added etc it would go to approriate route table.

Also the class on the surface made it appear as if all in 1 table.
Logged
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #19 on: November 12, 2007, 07:54:16 PM »

well, i am now partitioning the large table up into tables for each letter. will give an update once things are finished importing.
Logged
thedarkness
Global Moderator
Lifer
*****
Offline Offline

Posts: 581



View Profile
« Reply #20 on: November 13, 2007, 07:46:19 PM »

Did you try creating those indexes and testing them as per my large post on this page Indica? All evidence would suggest that that query was running without using an index, if you can get it to use a suitable index I would expect it to make a hell of a difference........

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
chide
Rookie
**
Offline Offline

Posts: 20


View Profile WWW
« Reply #21 on: November 15, 2007, 03:44:04 PM »

Don't bother with partitioning.. I've had databases with 300mil rows and it didn't make any substantial difference.

One major thing to look at though:  how big is your key buffer?  If you're playing with tables in the tens of millions you'll want at least 256mb.  If it's an InnoDB table (which I'd use for a table that big) make sure to set InnoDB's buffer pool to at least 256mb.  The server should have at least a gig of ram in it.

More memory makes everything faster.  Smiley
Logged
nutballs
Administrator
Lifer
*****
Online Online

Posts: 3525


View Profile
« Reply #22 on: November 15, 2007, 07:41:45 PM »

id agree. In MsSQL i did a database that processed 1-2 gigs of weblogs a day. No partitioning.

In my experience, proper organization of the tables and indexes can be just as effective as partitioning, without the coding legwork to deal with it.

Though of course in msSQL, there is auto partitioning(i cant remember what the hell its called for some reason), so you can do that if push comes to shove. I assume MySQL doesnt do that though.
Logged
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #23 on: November 15, 2007, 09:49:53 PM »

Don't bother with partitioning.. I've had databases with 300mil rows and it didn't make any substantial difference.

One major thing to look at though:  how big is your key buffer?  If you're playing with tables in the tens of millions you'll want at least 256mb.  If it's an InnoDB table (which I'd use for a table that big) make sure to set InnoDB's buffer pool to at least 256mb.  The server should have at least a gig of ram in it.

More memory makes everything faster.  Smiley

the server this db is on hasn't got much ram. i've been testing on my own test server which does have 1gb ram. partitioning seems to have helped. a prune of 125k userids took 10 seconds, before it would take 60. i will try increasing the buffer and switching to innodb, if no increase is seen with this new change, well, 10 seconds is fine.

it seems my version of sql doesn't have innodb as a storage engine. i'll have to look into installing it.
Logged
chide
Rookie
**
Offline Offline

Posts: 20


View Profile WWW
« Reply #24 on: November 15, 2007, 11:42:37 PM »

What version of mysql are you using?  Version 5+ comes with InnoDB standard.  I wouldn't use older stuff anyway.. they've made a ton of improvements in the last few years.
Logged
thedarkness
Global Moderator
Lifer
*****
Offline Offline

Posts: 581



View Profile
« Reply #25 on: November 16, 2007, 04:01:30 AM »

Get your indexes right or it will come back to bite you.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Online Online

Posts: 5324


:sniffle: Humor was so much easier before.


View Profile
« Reply #26 on: November 16, 2007, 12:59:44 PM »

Get your indexes right or it will come back to bite you.

... and with the size of that database, it'll be a great white bite, not a mosquito Wink
Logged

If I can't be Mr. Root then I don't want to play.
Indica
Rookie
**
Offline Offline

Posts: 49


View Profile
« Reply #27 on: November 16, 2007, 06:22:54 PM »

am using mysql 5.0.20, xampp setup.

i'll fix the indexes, before they bite my leg off  ROFLMAO
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 1142


View Profile
« Reply #28 on: November 17, 2007, 02:19:46 AM »

The size of the db does not matter when it comes to queries, (or matter very little).
What kills u is when u have to update.
Or that was the case on mysql 3

When u update the table it has to lock and then unlock.
Logged
chide
Rookie
**
Offline Offline

Posts: 20


View Profile WWW
« Reply #29 on: November 17, 2007, 04:43:28 PM »

InnoDB does row level locking so it's not an issue anymore if you use that table format.  Smiley
Logged
Pages: 1 [2] 3
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!