Efficient code – a quiz

I was recently reading a set of “golden rules” for embedded programming. I am very skeptical about such proscriptive instructions as, for them to be valid, a great many assumptions must be made and clearly stated. These rules were supposed to promote the production of safe, efficient code. I am OK with “safe” – that simply means that the code does what it is supposed to do, without deviation as a result of unexpected data or unexpected side-effects. I am not so sure about “efficient”, as that depends entirely on the design parameters in force; it might mean fast or it might mean small, for example. And what about maintainability and portability being significant parameters?

So, instead of trying to outline rules myself, I thought that I might set out to find out what real developers think …

Just for fun [budgets to not stretch to offering a prize], I would like to pose a question and request answers by comment or email:

You are developing a system, where execution speed is important. At one point in the code, there is an unsigned integer variable x which needs to be divided by 8. I can immediately think of 4 ways to code this:

x /= 8;
x = x / 8;
x >>= 3;
x = x >> 3;

My question is: which of these options is the most efficient and why? Of course, if you have other suggestions about how to address the problem, I would be very interested to hear.

I will publish some results and my answer in a couple of weeks.

Post Author

Posted September 26th, 2011, by

Post Tags

, , , ,

Post Comments


About The Colin Walls Blog

This blog is a discussion of embedded software matters - news, comment, technical issues and ideas, along with other passing thoughts about anything that happens to be on my mind. The Colin Walls Blog

@colin_walls tweets

Follow colin_walls


10 comments on this post | ↓ Add Your Own

Commented on 26 September 2011 at 10:04
By Peter Bushell

The second is clearest, but the first is a well-known C idiom and is quite acceptable. The others are not acceptable because they hide the arithmetic nature of the job to be done. The mechanism by which division is performed is a matter for the processor, aided in this case, possibly, by the compiler, which should optimise the code to a simple shift if this results in quicker execution than just generating a divide instruction (assuming the processor provides one, of course)

There is another important point, too, related to the C standard, but you have neatly rendered its discussion “off-topic” by making x unsigned!

I take it, by the way, that you don’t want us to divert the discussion into arguments about the use of the explicit constants 8 and 3 in your examples.

Commented on 26 September 2011 at 16:00
By Colin Walls

Thanks Peter. I will comment in a follow up blog soon.

Commented on 26 September 2011 at 16:30
By Dan S

I agree with Peter.

I’ve heard the saying, “If you mean to shift, then shift; if you want to multiply or divide, then multiply or divide”.

Generally speaking, on most architectures, all 4 possibilities will result in the same assembly code being generated, at least with any sort of optimization turned on.

But the first 2 versions are more expressive – you want to divide, and you are dividing. Not to mention the fact that now if you want to divide by 9, it’s a single character being changed. Also, even though most of us see “>> 3” and “instantly” know this is a divide-by-8, some people might have trouble converting 2**3 to 8. I had to fix code once where the original coder meant to divide by 16, but everywhere in the code it was coded as “>> 5” – the dreaded “off by one [bit]” error.

And what if the value now changes to a floating point (float, double) type? How’s that bit shift working out now?

But most importantly, the first 2 versions do not rely on any specific architecture or internal representation. Although most architectures use 2s complement representation, this is not universal; binary representation is an implementation detail of the processor.

Bottom line: write your code for human beings, and let the compiler do its job.

Commented on 28 September 2011 at 15:41
By Ken Simone

You are developing a system, where execution speed is important.
Fewer Cycles = faster execution speed
Use of fewer registers is more efficient

I would code x = x >> 3; I may create an intermediate term xshiftn
The shift should take 2 cycles and use 1 register
The divide code would take 3,4, or 5 cycles to produce a result and probably use 2 or 3 registers.
Code for the machine. The code may become the machine!

    Commented on 28 September 2011 at 15:45
    By Colin Walls

    Thanks Ken. I will be commenting on the responses I have received in a blog post very soon.

Commented on 29 September 2011 at 23:17
By Krzysztof Wesołowski

Code should present idea, operation, not implementation details. If your compiler do not optimize dividing by 8 then there is one ultimate way to efficient program – change compiler.

    Commented on 30 September 2011 at 08:30
    By Colin Walls

    I agree entirely Krzysztof.

Commented on 30 September 2011 at 12:34
By Lee Riemenschneider

I also agree with Krzysztof. Back when we first moved from Assembly to C, we used to examine the Assembly output of the compiler, and we found that certain C constructs were more efficient than others. That gave the Assembly “old guard” reason to claim C was useless and to keep coding in Assembly. Shift rather than divide and if-then-else rather than switch-case were two hand optimizations we saw. After a few years, the C compilers solved these issues and it became no longer necessary to look at the compiler output in Assembly. I would hope that those years are long behind us.

Commented on 5 October 2011 at 04:42
By Shaun Shippey


I am a recent graduate so know that my reply is coming from a position of inexperience.

Without actually looking at the assembly code I would assume that method 1 and 2 are identical and would produce the same result. The second method may be a tad bit clearer but that’s the only difference I can spot.

A possible problem with using a right shift as a multiplier, even if it did result in better optimization, may in fact come from cross platform applications. In short, I believe that the code will have portability issues when dealing with data of different sizes and could possibly produce unreliable results in certain applications.

Just my thoughts,

– Shaun

Add Your Comment