The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 17, 2019, 06:36:44 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Ultimate tight loop optimization  (Read 4727 times)
m0nkeymafia
Expert
****
Offline Offline

Posts: 240


Check it!


View Profile
« on: May 13, 2007, 02:47:46 AM »

Right guys this is a trick which you either know or dont know.
It is the freakin tits.

I started using this technique when I wrote a GLIB for work.
It has to be cross platform so I couldnt take advantage of any architecture specific niceness.

Anyway in my benchmark tests it trounced memcpy for filling memory and also works on more complicated matters.

The technique is called Duffs Device.  It uses case fallthrough to very quickly iterate through a list basically unrolling it.

Its in a macro so cant really be used for beefy loops but for tight ones its amazing.

Code:
#define DUFF_DEVICE_8(aCount, aAction) \
{ \
int count_ = (aCount); \
int times_ = (count_ + 7) >> 3; \
switch (count_ & 7){ \
case 0: do { aAction; \
case 7: aAction; \
case 6: aAction; \
case 5: aAction; \
case 4: aAction; \
case 3: aAction; \
case 2: aAction; \
case 1: aAction; \
} while (--times_ > 0); \
} \
}

If you know what loop unrolling is then youll know how to use it, if not reply and ill try answer any questions.

Logged

I am Tyler Durden
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #1 on: May 13, 2007, 03:18:31 AM »

Radical stuff monkey, wasn't aware of this, keep 'em coming  Wink

I'll be looking for a place to use this.

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
m0nkeymafia
Expert
****
Offline Offline

Posts: 240


Check it!


View Profile
« Reply #2 on: May 13, 2007, 09:59:26 AM »

It can be used anywhere dude
Ill give u a rough example [excuse me if syntax is not 100% right]

const int BUFFER_SIZE = 1024*32;
DUFF_DEVICE(BUFFER_SIZE, *(dst++) = *(src++));

that should copy from src to dst buffers uber fast.
ill write a post in a sec on the notation used, not sure if many people use it
im an optimization FIEND lol
Logged

I am Tyler Durden
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #3 on: May 13, 2007, 06:28:31 PM »

yeah, got you incrementing pointers as per the Stroustrup example in the Duff's Device page on wikipedia. I follow, I'm a believer ;-)

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: May 14, 2007, 04:35:03 PM »

for some reason i cant wrap my head around this one. conceptually im having a problem even getting it at all. is there a real example of how this would be used, that i could understand, maybe an analogy? (i think my ADD is getting ing the way). Obviously im an ASP guy so this wont work for me, though I am curious to try it for kicks, but im sure the interlacing of statements wont be allowed. but aside from that, i am a coder of past, but maybe im outta shape. lol

Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: May 14, 2007, 04:43:30 PM »

Thank god.

I read that code about a dozen times and - although it's been more than a dozen years since I've been a c++ster, I didn't think I was THAT old and stoopid... looking forward to more understanding as well - sorry boys for the Codezheimers...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #6 on: May 14, 2007, 07:46:27 PM »

You need to read this;
http://en.wikipedia.org/wiki/Duff's_device
http://en.wikipedia.org/wiki/Loop_unwinding

Loop unwinding is an ancient technique for optimization and, as monkey mentioned, should be done by the compiler but.......

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #7 on: May 14, 2007, 07:47:54 PM »

Actually, I'm going to read all of this when I get the chance.

http://en.wikipedia.org/wiki/Category:Compiler_optimizations

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #8 on: May 14, 2007, 08:10:25 PM »


Whew - not quite as crusty as a I feared. I've never used the term "Loop Unwinding" but it makes complete sense. Duffs as well - now I completely get where that's going.

Interesting point in @Wikip though - claimed that although this was pretty damn efficient, in the *to++ = *from++ form (modified Duffs) this is probably less efficient than what is optimized in the compiler - they've probably got it even more tightened down than that.

Interesting stuff - haven't thought that way in a while. Kinda nice to get there for a bit.

Thanks,
/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!