The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 12:39:55 PM

Login with username, password and session length


Pages: [1] 2
  Print  
Author Topic: Database Optimization?  (Read 10401 times)
cdc
Expert
****
Offline Offline

Posts: 105


View Profile
« on: April 21, 2007, 01:32:36 PM »

Anybody know any good (or great) tutorials on database optimization? I remember the basics from my college classes, but I'd really like to take it a step further.

I'm looking for answers to my questions like:

1. When should I create a second index on my tables aside from the auto incrementing id field?

2. What should I use to store IP addresses that are looked up often, never changed, and only added to infrequently? I'm guessing a VARCHAR isn't best...

etc.
Logged

Will code for food.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: April 21, 2007, 02:58:06 PM »

Unfortunately dbDog is on vacation in Hawaii or he'd be all over this.

From what I remember about academic "optimization studies" they were all crap. Had no solid connection to the real world.

Simple answers to your first questions:

1. When should I create a second index on my tables aside from the auto incrementing id field?
When you have more than 1000 records in the table and intend to do lookups in a particular way. You should never have indicies for things you won't be looking up frequently. Additionally, indicies may not be behaving the way you'd expect or even used. The Cardinality of the index is really important. In MySQL for example, if there is not a cardinality of at least 50% ie., no more than 1/2 the records are duplication, then the index will not even be used in a search. Consider: if you had a field called "fulfilled" and it is either on or off (you're selling something and you want to quickly get a list of all fulfilled or not) with 10000 records. The index will not com into play because there is not enough difference in the records to do so. If you simply make the index fulfilled + id (the unique number) it will be quick as you'd expect. When you intend to do a bunch of queries on a table, it helps to have MySQL explain the SQL it will be using to execute the query. This will help in your optimization.

2. What should I use to store IP addresses that are looked up often, never changed, and only added to infrequently? I'm guessing a VARCHAR isn't best...
It would seem - and years ago that was true. I remember dbDog drilling me 10 years ago if I chose a varchar over a char in Oracle because of speed differences... but in MySQL today (standard tables, not innodb and such) I think anything over 12 chars IS a varchar automatically.

I store my IPs in a 16 char varchar and it works great.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
cdc
Expert
****
Offline Offline

Posts: 105


View Profile
« Reply #2 on: April 21, 2007, 04:14:57 PM »

Ok, thanks perk.

Quote
When you intend to do a bunch of queries on a table, it helps to have MySQL explain the SQL it will be using to execute the query.

How do you get MySQL to tell you the SQL it will be using?

Quote
I store my IPs in a 16 char varchar and it works great.

Well, I'm happy to hear that because I'm doing the same thing. My server has been running at full speed recently and I thought my recent introduction of cloaking might be the reason, but I'm doing much more than that. It's good to know that I can cross that off the list of things to worry about optimizing.
Logged

Will code for food.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: April 21, 2007, 04:31:22 PM »

How do you get MySQL to tell you the SQL it will be using?

EXPLAIN select * from mytable where this=that order by another

The resulting table will contain the hierarichy of how it will attack the puzzle of your query.

Put some tables together, put some queries up against them and post what you get - we can help you understand what it's saying to you.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #4 on: April 21, 2007, 04:33:40 PM »

Yeah, post what you get, EXPLAIN can be a bit daunting when you first attack it so it'll prolly cut the learning curve for a few souls.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #5 on: April 21, 2007, 10:18:53 PM »

the best way to store IPs is to convert them to an integer. Actually BigInt in MSsql and i am pretty sure in MySQL as well.

I posted an ASP version of an IP2num converter here:
http://www.perkiset.org/forum/asp_net/classicasp_ip_to_num_and_num_to_ip_for_database_storage-t72.0.html

Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: April 22, 2007, 11:33:30 AM »

I like that and had not ever thought about it that way. When you need to list out all IPs that are in the DB, doesn't that add a huge amount of overhead to the process though?

When you assert it is the best way, may I ask why? I've stored them as varchars, 4 sets of 3 chars, but never a big int and I am struggling to see why that is superior?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #7 on: April 23, 2007, 08:40:27 AM »

Yea listing out the IPs ads overhead, obviously because it has to translate from Int to IP. however, usually you would do something like that somewhere other that the SQL server. forexample, have the webserver do it, or the client side app.

the reason its better is that INT lookups are always faster. (unless someone changed the rules and i wasnt paying attention). they also use less memory (which also makes them faster as well). Thats the reason why you try to always have a numeric ID column to join tables on. have you ever tried joining on text? I'm sure you dont, unless you HAVE to, like  joining sales to a product title. and you wait...
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #8 on: April 23, 2007, 09:22:29 AM »

Gotcha that makes sense. And yup, of course I join with the cleverly named "id" field which is the PKey on all tables.

But an indexed lookup should encur the same overhead, unless I am missing something as well... no? Also, a BigInt in MySQL is 20 bytes and you can store an entire address in 16 - I did not look to see if your routine scrunched the number down below the stock 11 byte MySQL integer.

But if you're now talking hash tables or cached lookups in RAM, I totally agree.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #9 on: April 23, 2007, 10:00:10 AM »

your right bigint is bigger duh. i was flipflopping between int and big.

but when you add in the storage of the index table, the 2 lookups to find 1 value, i still think the INT method works faster. its moot though, unless your talking millions of lookups per minute probably. I just have always done it that way because at the time i started doing it, every cycle counted on the SQL box, even at 10 hits a minute probably.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #10 on: April 23, 2007, 10:05:26 AM »

Where this routine intrigues me the most is in my cached spider lookups.

I move addresses into the cache to watch for spiders, denieds and such and I like this a lot. Since I'd pay for the conversion of all records only 4 times a day (when I re-up my spider listings) and only on a single address when looking at < the current surfer > but the "walk the hash routine" is looking at ints rather than strings I think I'd get a big bang for buck out of it.

Now, looking at your code - it looks as though the number you create is essentially a base-256 version of the number... is that correct? (clever, that - nicely done)
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #11 on: April 23, 2007, 10:28:46 AM »

yes it is base 256. never actually realized that lol.

To Convert to INT just break the octets apart then SUM(256^n-1)
A.B.C.D
(A*256^3)+(B*256^2)+(C*256)+D


to convert back
((number/256^3) % 256).((number/256^2) % 256).((number/256) % 256).(number % 256).
where % is the modulus function of your language. so your just getting the remainder of each octet.

Mod is limited to INT in asp though so my code chunk does some funky math to compensate.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #12 on: April 23, 2007, 11:29:07 AM »

I can't get shit done today so I thought I'd adapt this routine for php - it can be seen here:

http://www.perkiset.org/forum/php/converting_ip_adrresses_to_integers_and_back-t87.0.html

NBs - it's pretty tight and would deserve a look so that you can see ASP -> PHP equivalent code.

All that being said, I'd love to see a MySQL or Oracle guy weigh in and show the code to do this at the database level...

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #13 on: April 23, 2007, 07:32:07 PM »

I wonder how it would go written as a UDF? http://en.wikipedia.org/wiki/User_Defined_Function

or, as a procedure http://dev.mysql.com/doc/refman/5.0/en/writing-a-procedure.html

What's the thoughts on overhead/speed advantages/disadvantages ?

It would constitute an interesting exercise and reminds me a lot of a checksumming algorithm I wrote many moons ago. If consensus is that it would indeed be faster/less overhead I would be happy to have a crack at it.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #14 on: April 23, 2007, 09:24:17 PM »

I personally try to keep as much data manipulation off the SQL just because i like my SQL to do the one thing its good at, lookups. But thats really just personal preference. obviously the less data you push across a network, the better. so it really depends on the situation. in this case, pulling the manipulation off the SQL box is probably better, if... they are on seperate boxes.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Pages: [1] 2
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!