nutballs
Since the most effective way to store an IP in a database is actually as an Integer (or BigInt technically) this function pair is to convert an IP4 octet to a BigInt and vice versa.
function IP2Num(sip) dim str1,str2,str3,str4 dim num IP2Num=0 if isnumeric(left(sip,2)) then str1=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str2=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str3=left(sip,instr(sip,"."-1) str4=mid(sip,instr(sip,"."+1) num=cint(str1)*256*256*256+cint(str2)*256*256+cint(str3)*256+cint(str4) IP2Num = num end if end function function num2ip(ipn) 'converts from the integer representation of the IP to the quad-octet notation dim a,b,c,d,tmpnum,ipnum ipnum=cdbl(ipn) a=int((ipnum/(256^3))) mod 256 b=int((ipnum/(256^2))) mod 256 c=int((ipnum/256)) mod 256 'due to the limit of MOD in aspto 4byte integers we gotta do some funky math here for the last octettmpnum=int(ipnum/2) 'divide by 2 and toss the decimal tmpnum=tmpnum*2 'multiply back up by 2 to get the "even" value tmpnum=ipnum-tmpnum 'subtract new number from original to get adjuster: 0 or 1 d=((int((ipnum/2)) mod 12*2)+tmpnum 'funky math should only be: ipnum mod 256 num2ip=a&"."&b&"."&c&"."&d end function StephenBauer
I have been meaning to benchmark that (traditional) method against the other method of just left padding out each octet with zeros and then concatenating them all together to form a single number converting it to BigInt. You wouldn't have to pad out the first octet either. May be faster in the "converting to" side but probably slower when "converting from" due to the use of more string functions. Or am I on crack. SB nutballs
quote author=StephenBauer link=topic=72.msg2018#msg2018 date=1181260058 I have been meaning to benchmark that (traditional) method against the other method of just left padding out each octet with zeros and then concatenating them all together to form a single number converting it to BigInt. You wouldn't have to pad out the first octet either. May be faster in the "converting to" side but probably slower when "converting from" due to the use of more string functions. Or am I on crack. SB the only problem is the first octet, unless you add a preceding number like 1 then something like 12.34.5.123 could be 1012034005123 never thought of that... that might actually work. max would be 1.255 trillion. i would guess though that the math manipulations would be faster than the extra steps of converting to strings, concating together, the converting back to int. hmmm. might be worth a test. I have always wondered if there would be an advantage to storing each octet as a tiny int. 4 columns. it would be only 4 bytes, but the lookup would be 4 columns equal query. StephenBauer
quote author=nutballs link=topic=72.msg2024#msg2024 date=1181268511 quote author=StephenBauer link=topic=72.msg2018#msg2018 date=1181260058 I have been meaning to benchmark that (traditional) method against the other method of just left padding out each octet with zeros and then concatenating them all together to form a single number converting it to BigInt. You wouldn't have to pad out the first octet either. May be faster in the "converting to" side but probably slower when "converting from" due to the use of more string functions. Or am I on crack. SB the only problem is the first octet, unless you add a preceding number like 1 then something like 12.34.5.123 could be 1012034005123 never thought of that... that might actually work. max would be 1.255 trillion. i would guess though that the math manipulations would be faster than the extra steps of converting to strings, concating together, the converting back to int. hmmm. might be worth a test. I have always wondered if there would be an advantage to storing each octet as a tiny int. 4 columns. it would be only 4 bytes, but the lookup would be 4 columns equal query. Wouldn't need to add anything to the first octet as long as the remaining octets are padded out. But, yeah, decoding would probably take longer than the other method that is all math. Encoding would be a toss up with all the string functions in both methods. The four integers, composite key method is interesting too. It could have some performance improvements for particular octet, range oriented queries. SB StephenBauer
For the original conversion to integer method, all the multiplication could be replaced with left bit shifting to maybe speed things up a bit (by 24, 16, 8, and none per octet, left to right).
SB StephenBauer
Regex.Split could be useful to test to.No, not obsessed...will be writing some new logging code soon to mate up with some voting code. SB perkiset
Actually I think SB is onto something - but I don't know if you can do it in your language - it's certainly what I miss in
PHP.Just do it C-string style ie., create a string that is 13 characters long, like this: 1000000000000 (the 1 on the end handles the numeric problem you pointed out NBs) then drop the string values of each char of the octets into the appropriate positions in the new string with pointers. No string or numeric manipulation at all. When tearing it back apart, pull the components back out using direct string pointers and drop them into a new string. The loops would be small and about as tight as a loop can be... and since it'd be all pointer stuff with no calculation I should think it would scream. Bummer I can't try that in PHP- although I've been toying with writing some of my own language extensions forPHPin C++ ... (yeah right... like I have time for that now :- ) that'd make a killer fun exercise.../p StephenBauer
Unless I am brain dead, what is the "numeric problem nutballs pointed out" with the "1"? The first octet wouldn't need padding unless you employ the method of string pointer substitution you just brought up and then you need something as a placeholder for each possible digit position of each octet (i.e. twelve zeroes). The remaining three octets, always being padded, ensure the variable digit length of the first octet do not create duplicate numbers...unless I am oversimplifying in my head. SB perkiset
Nope you're right... if you pad the B, C & D octets then stringwise you wouldn't care what the A octet was... the bigint resulting number would store just fine in the DB. Wouldn't need the 1 after all...
nutballs
your right. i was over thinking it.
though I habitually add a 1 in front of any string based numbers that i want to guarantee the string length, just in case I am doing something that results in leading zeros. The situation where this always comes up for me is in reports. you would want all numbers in a column to display the same length, so as not to cause visual distractions in the column. Its more of a display related problem. I didnt think about the fact this will never be displayed, just queried. nutballs
so now. the logic would be.
split IP into and array of 4 octets check string length if short, add zeros to fill concat i really am wondering which would be the fastest? string 0 padding math conversion (though in ASPthis might now be the worst because of the INT limit of mod)4 keys in the database (possible because of being such short datatypes) string pointers, which i think might be essentially the same task as string padding using length checks, since you would still need to know character positions. My hunch is actually string padding, or 4 keys. since i do so many IP lookups, this is probably a good thing for refinement. perkiset
quote author=nutballs link=topic=72.msg2039#msg2039 date=1181318278 split IP into and array of 4 octets check string length if short, add zeros to fill concat No no, C string style ie., string arrays not strings - I am shit anymore with C++ on the fly, so here's some pseudo object pascal code so you can read the idea (in object pascal, you can address string positions as an array, much like you address chars in a string in C - although with C I'd use pointers which are faster than arrays) - but also, that was my question above - can you address strings in a C style in your language? // strings are zero indexed, just like C - so get the actual array position of the last char inBuff = '192.13.1.34'; inPtr = strlen(inBuff) - 1; // in C I'd need an actual Chr(0) at the end to terminate this string outBuff = '000000000000'; // Last char position in the outbuff... outPtr = 11; // I'll use this var * 3 in the routine to point to "jump spots" where I can // reposition the output pointer past any needed padding octet = 3; // Loop so long as there is something in the input buffer to copy... while (inPtr >= 0) { // Copy the char at position inPtr into outBuff at position outPtr outBuff[outPtr--] = inBuff[inPtr--]; if (inBuff[inPtr] == '.') { // We've hit a period in the input - reposition the output and jump over the period... outPtr = octet-- * 3; inPtr--; } } // At this point, outBuff contains my final string representation of the number... functionally, this is little more than a reverse StrCpy function, just skipping a bit for the pads and the periods. When you posted it to the database, any leading zeros left on the string would be dumped by the SQL compiler. Taking it apart would pretty much be just as easy. perkiset
Oh fishit it.
Just downloaded the entire tutorial on how to write custom extensions forPHPin C++... thanks boys, like I didn't have enough to doI'll post a sister thread in PHPreferencing this if I ever actually do something with itStephenBauer
Hehe. You are a true hack! As for the four columns in a database (one per octet), like I said, it may help in range reporting and it also halves your storage from 8 bytes for a long/bigint to 4 bytes for four bytes (duh ). I think all four methods, each in its own independent loop in the same program, would yield profile results each showing a percentage of run time...whichever is lowest is the quickest. Of course you will want to add a database store and retrieve as well. Report back, nutballs! SB nutballs
testing done.
4 loops. random IPs checked, each of which occur in the database, so no negative hits. each cycle uses the same ips for each type of query. 100 at a time 1 million in DB same table for all methods ipstring is a string lookup ipnum is my method ippadded is SBs method ipssplit is each octet in a separate column nothing done with query results. Average ipstring: 36 31 32 30 31 29 30 31 31.25 ipnum: 31 29 38 33 27 30 27 34 31.125 ippadded: 49 37 38 40 39 38 36 41 39.75 ipsplit: 26 28 26 26 24 24 23 25 25.25 So here is the surprise... As a string, or as a bigint number, the lookup is the same.... wtf? is the MOD math all that difficult? i guess so... padding is the slowest, sorry SB. but here is the other surprise. 4 column lookup is the fastest by about 20% over the ipnum method. Im gonna try making it 10million rows and lookup 1k at a time. see that the numbers stay the same. Each loop is exactly the same, except for the conversions. here is a code fragment of the important stuff for review. '======= ipstring t=now for i = 0 to ubound(arr) sql="select top 1 * from ips where ipstring='"&arr(i)&"'" db.execute sql next response.write "ipstring: "&datediff("s",t,now)&"<br>" '======= ipnum t=now for i = 0 to ubound(arr) sql="select top 1 * from ips where ipnum="&ip2num(arr(i)) db.execute sql next response.write "ipnum: "&datediff("s",t,now)&"<br>" '======= ipPadded t=now for i = 0 to ubound(arr) sql="select top 1 * from ips where ippadded="&padded(arr(i)) db.execute sql next response.write "ippadded: "&datediff("s",t,now)&"<br>" '======= ipsplit dim A t=now for i = 0 to ubound(arr) a=split(arr(i),"." sql="select top 1 * from ips where ip1="&a(0)&" and ip2="&a(1)&" and ip3="&a(2)&" and ip4="&a(3) db.execute sql next response.write "ipsplit: "&datediff("s",t,now)&"<br>" function padded(ip) dim iparr,ippadded,k iparr=split(ip,"." ippadded=Right("00" & CStr(iparr(0)),3)&Right("00" & CStr(iparr(0)),3)&Right("00" & CStr(iparr(0)),3)&Right("00" & CStr(iparr(0)),3) padded=ippadded end function function IP2Num(tsip) dim str1,str2,str3,str4,sip dim num 'IP2Num=0 sip=tsip if isnumeric(left(sip,2)) then str1=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str2=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str3=left(sip,instr(sip,"."-1) str4=mid(sip,instr(sip,"."+1) num=cint(str1)*256*256*256+cint(str2)*256*256+cint(str3)*256+cint(str4) IP2Num = num end if end function perkiset
NBs I guess I was a bit confused... I thought the test was more about the speed of the indexes than the code... in the example you give, the queries will be completely dependent on your function calls, which may be compiling into pretty expensive code. There's a lot of string code that's being called for every query, if I read that correctly.
Could you throw a test out that did something like this: Pre-randomize all of the IPs you are going to look for ie., create an array of either 1M strings or 1M numerics. Then simply walk that array and look them up, doing no functional code inbetween each lookup - then we'll know the pure difference between a string and a numeric based index - which is what I thought the original notion was. Then we'd know if it even makes sense to pursue better functions for converting that string into a number. nutballs
the test i ran though is relevant to how you would actually use these lookups. I just ran in loops of 1000 so that I could get larger discrepancies to measure. This also ONLY tests lookups, which is what 90% of the usage of this would be for anyway. storage is a heavy hit no matter how you slice it, heavier even with indexes. But in the end the relevance is DB querytime plus manipulation time in code to prepare for the lookup. This is only for Classic
ASPmindo you. It might be different for other langs. Another test to try would be the same exact thing but using stored procs. so something like a straight query such as "select top 1 * from table where ipnum=sp_ip2num('207.123.234.12')" which would give a more pure result, regardless of the interface language.the records in SQL are not indexed, though i was planning on running them indexed as well. For the Padded method, i tried a couple different manipulations, and the one i ran with performed the best it seemed. thats the right("00"&cstr(ip1),3) part obviously. i tried for loops, i tried a string of IF statements, though i dont know if i tried a select/case statement... hmm. i'll try that too. oh, and i guess i do have 2 of them, not being delivered via function. i will functionify those as well and post updated results. every test is exactly the same, except for the method of lookup t=now '===set a start time for i = 0 to ubound(arr) '===loop the entire array of IPs that I have pre-created for testing. sql="select top 1 * from ips where ipnum="&ip2num(arr(i)) '===build the SQL query db.execute sql '===run it, but return nothing next response.write "ipnum: "&datediff("s",t,now)&"<br>" '===write out the seconds of change. nutballs
ok, update. and this is now weird.
I have made it so each test is as close to the same as possible, with the only variance being the method of lookup. each is called from a function to make sure the function call wasnt adding extra baggage to the process. I also tested all 3 padding methods i could think of. I even made the query only request the number 1, so that row length doesnt even figure in. so here are the results. Averages of 5 runs with 50 lookup each. average seconds pct faster then next up split octets: 28 30% string: 36.4 4% numeric: 38 26% padded Caseloop: 48 1% padded ifloop: 48.4 7% padded right: 51.8 0% Split 4 column octets works the fastest it seems. but frankly im baffled. 2.4 million records, no indexing, and string based lookup is faster than numeric?!?!@?? functions for reference function ipsplit(ip) dim a,sql a=split(ip,"." sql="select top 1 '1' from ips where ip1="&a(0)&" and ip2="&a(1)&" and ip3="&a(2)&" and ip4="&a(3) ipsplit=sql end function function ipstring(ip) dim sql sql="select top 1 '1' from ips where ipstring='"&ip&"'" ipstring=sql end function function ippaddedright(ip) dim iparr,sql iparr=split(ip,"." sql="select top 1 '1' from ips where ippadded="&Right("00" & iparr(0),3)&Right("00" & iparr(0),3)&Right("00" & iparr(0),3)&Right("00" & iparr(0),3) ippaddedright=sql end function function ippaddedcase(ip) dim iparr,sql,octet,pad iparr=split(ip,"." pad="" for each octet in iparr select case len(octet) case 1 pad=pad&"00"&octet case 2 pad=pad&"0"&octet case 3 pad=pad&octet end select next sql="select top 1 '1' from ips where ippadded="&pad ippaddedcase=sql end function function ippaddedif(ip) dim iparr,sql,octet,pad iparr=split(ip,"." pad="" for each octet in iparr if len(octet)=1 then pad=pad&"00"&octet elseif len(octet)=2 then pad=pad&"0"&octet else pad=pad&octet end if next sql="select top 1 '1' from ips where ippadded="&pad ippaddedif=sql end function function IP2Num(ip) dim str1,str2,str3,str4,sip,sql sip=ip str1=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str2=left(sip,instr(sip,"."-1) sip=mid(sip,instr(sip,"."+1) str3=left(sip,instr(sip,"."-1) str4=mid(sip,instr(sip,"."+1) sql = "select top 1 '1' from ips where ipnum="&str1*256*256*256+str2*256*256+str3*256+str4 ip2num=sql end function StephenBauer
Sweet dude. I didn't really mean you had to do it! But it is way cool that you did. This will come in handy a bit down the road and save me some time too. has been credited to your PayPal account. Did you try the math using the binary shift at all? SB nutballs
no i didnt try binary shifting since i couldnt figure out how to do it in
ASP(mostly cause i havent bit shifted since college), and didnt really spend much time on the whole thing. only took me about an hour in front of the TV, refining stuff down and such.im gonna guess though that 4 columns is going to be the best method, in any system, since i have thought about it a bit more. Converting to bigint is going to be the same math in any language, and would be the same steps. Padding also, though maybe certain languages may have faster implementation of IF/Case/For but i doubt it. Where as the 4 column method is a fixed array resulting from a split of the IP, which all languages should do about the same efficiency. The possible values are limited to 255 per column, meaning that little gain should be achieved from indexing i think. but im gonna try it anyway and post those results. esrun
Maybe consider test flat file and use is_file
|
Thread Categories
Best of The Cache Home | ||
Search The Cache |
- Ajax
- Apache & mod_rewrite
- BlackHat SEO & Web Stuff
- C/++/#, Pascal etc.
- Database Stuff
- General & Non-Technical Discussion
- General programming, learning to code
- Javascript Discussions & Code
- Linux Related
- Mac, iPhone & OS-X Stuff
- Miscellaneous
- MS Windows Related
- PERL & Python Related
- PHP: Questions & Discussion
- PHP: Techniques, Classes & Examples
- Regular Expressions
- Uncategorized Threads