
![]() |
perkiset
Nutballs asserted on the
ASPboard that a numeric index would be faster than a string index – particularly for looking up IP addresses. His findings as relates to actual practice notwithstanding, I wanted to do a pure test of the indicies in MySQL. So I took my list of 500,000 pseudo addresses from myPHPExtension in C thread and added them to a table that looked like this:CREATE TABLE ipseektest' ( StephenBauer
What if you use the four byte sized columns as per NBs best performance spec. Primary keys in MySQL are clustered, no? A speed and DB sizing benefit in one. Throw in the binary shifting and you may have the best method possible. Cannot recall off the top of my head if PHPdoes binary shifting as an operator...probably does. I know C derivitive (sp?) languages support it.SB perkiset
quote author=StephenBauer link=topic=314.msg2114#msg2114 date=1181771985 What if you use the four byte sized columns as per NBs best performance spec. Primary keys in MySQL are clustered, no? A speed and DB sizing benefit in one. Was going to play with that one, but PinkHat is on me because I've had too much fun for the last 2 days on this schtuff and have managed to avoid *actual* work... ![]() quote author=StephenBauer link=topic=314.msg2114#msg2114 date=1181771985 Throw in the binary shifting and you may have the best method possible. Cannot recall off the top of my head if PHPdoes binary shifting as an operator...probably does. I know C derivitive (sp?) languages support it.PHPdoes have bitwise operators ie., $c = $a << $b would say $c gets the value of $a, shifted left $b bits. Great for multiplications and division... but I'm afraid I'm not getting you here... please clarify so I don't look like a dork...![]() StephenBauer
I wasn't thinking too "specific" when I was typing that last reply...the "shifting" was in relation to this part of NB's code (all the *256 stuff): num=cint(str1)*256*256*256+cint(str2)*256*256+cint(str3)*256+cint(str4) But that wouldn't play into the code we're talking about. ![]() SB perkiset
Gotcha, makes sense - thanks!
nutballs
read it at syndk8 first, but like SB suggests, when you have time, try out the 4 tinyint column method. I have a feeling that will smoke the ip2num method. especially because it is just a split instead of some semi-complex math.
perkiset
quote author=nutballs link=topic=314.msg2119#msg2119 date=1181776967 when you have time, try out the 4 tinyint column method. I have a feeling that will smoke the ip2num method. especially because it is just a split instead of some semi-complex math. ![]() I'll let you know when I get a moment on it... /p nutballs
I guess semi-complex was the wrong way to put it.
Multi-step math. At the database level, i also am guessing a tie, but... the real use of this would be real IPs that would have to be converted to be looked up, so whatever system you use to do the conversion, would be the bottleneck. Im guessing a string split with 4 resulting parts of a string max length of 15 characters is going to be faster than that multi-step math shit thats going on in the IP2num conversion. but i really dont know, other than my test which resulted in 4column smokin everything else. curious about the pure DB level though. perkiset
And the results are in!
In my final test, I changed the database thusly: CREATE TABLE `ipseektest` ( nutballs
you should be able to use tinyint in mysql, just set it to unsigned?
In MSsql, tiny int is unsigned only, 1 byte. So space wise it would be a total of 4, making it half of a single bigint. string would be 15bytes. padded number though is <>INT since 255255255255 is well under the 2.1 billion of INT. so thats 4 bytes also.> bigint since its 255billion, so thats 8 bytes. Obviously the indexing would add space as well, but im not sure how much for tinyint if it would even make any difference, which i dont htink it would. The other factor is readability. Although I know your trying to find the fastest result, regardless of all else, database readability suffers greatly with converted columns. And if the speed difference is 0, i always go with the more human-readable method. but again, thats personal methodology. just to make sure, you ran it a couple of times right? just to make sure there was no change in serverload. I ran mine all in 1 single pass, each type 1 right after the other, and got a few spikes. but thats because i was running it on a live server. just making sure. edit:was being retarded with my decimals... nutballs
BTW, regarding Indexing. 2 lookups are required for indexing obviously, so... im guessing Un-Indexed, tiny will win, possibly even win overall.
StephenBauer
I think mySQL primary keys are inherently (sp?) clustered, no? And that is default behaviour for SQL Server primary keys too. Although you can make it non-clustered and create a clustered index later that isn't necessarily the primary key (at least in SQL Server). Clustered means the data is stored right there in the clustered index so it makes for quick access of the indexed data and you do not get any extra overhead (cpu time or storage needs) for (seperate, non-clustered) index storage. The (referenced) data is the clustered index or vice versa.
SB Added: This knowledge isn't directed at perk or NB in particular as they probably know this but those following along that do not... perkiset
quote author=nutballs link=topic=314.msg2129#msg2129 date=1181783326 The other factor is readability. Although I know your trying to find the fastest result, regardless of all else, database readability suffers greatly with converted columns. And if the speed difference is 0, i always go with the more human-readable method. but again, thats personal methodology. just to make sure, you ran it a couple of times right? just to make sure there was no change in serverload. I ran mine all in 1 single pass, each type 1 right after the other, and got a few spikes. but thats because i was running it on a live server. just making sure. @ Readability - I suppose an important point to make here is that I am not yet suggesting a recommended protocol... I'm just on the hunt for speed right now cause you piqued my interested you bastard. I agree that looking at numbers in the 100 billion range begins to look a bit weird... but if I don't spend too much time looking at them and the speed difference is in the 10x zone, it may be worth it. @ multipass - yup - a whole bunch of times. About 10 times, the ones I posted were nicely representative of the average. galileo
May be of some help: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html # INET_ATON(expr)Given the dotted-quad representation of a network address as a string, returns an integer that represents the numeric value of the address. Addresses may be 4- or 8-byte addresses.mysql> SELECT INET_ATON('209.207.224.40');-> 3520061480 The generated number is always in network byte order. For the example just shown, the number is calculated as 209×2563 + 207×2562 + 224×256 + 40.INET_ATON() also understands short-form IP addresses:mysql> SELECT INET_ATON('127.0.0.1'),INET_ATON('127.1');-> 2130706433, 2130706433 Note: When storing values generated by INET_ATON(), it is recommended that you use an INT UNSIGNED column. If you use a (signed) INT column, values corresponding to IP addresses for which the first octet is greater than 127 cannot be stored correctly. See Section 11.2, “Numeric Types”.# INET_NTOA(expr)Given a numeric network address (4 or 8 byte), returns the dotted-quad representation of the address as a string.mysql> SELECT INET_NTOA(3520061480);-> '209.207.224.40' |

Thread Categories

![]() |
![]() |
Best of The Cache Home |
![]() |
![]() |
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads