For those of you working on GCC projects with an eye on code generation, you’ve probably wondered why GCC doesn’t always emit DBRA instructions for C loops.
In fact, it rarely emits them - the stars need to be aligned just right. And sometimes you’ll be faced with something so ugly and inexplicable for a loop that it might put you off using the compiler at all…
First thing to realise here is that its not always ‘crap compiler’ but often ‘garbage in, garbage out’. And the C language is quite good for encouraging ‘garbage in’ for fuzzy areas like loop constraints….
Here I’m going to provide some tips on setting up those ‘stars’ properly so DBRA can work every time - for sensible loop code at least.
There is actually more than one reason for messed up DBRAs.
- the type size for the loop counter has to be 16 bits. ‘int’ is 32bit, and will usually prevent the use of DBRA.
- the sign of termination condition expression for the loop must MATCH the loop counter type. no signed vs unsigned tests allowed.
- the termination condition must exactly match the behaviour of a DBRA. that means ‘c >= 0’ is not allowed (DBRA will loop where < 0 will terminate).
- the ‘for’ statement assumes 0-or-more iterations of the block without additional constraints. DBRA assumes 1-or-more iterations.
- referring to the loop counter inside the loop, which it might not be able to map into ‘dbf space’.
- counting in the wrong direction (esp. if referring to its value).
- counting in the wrong direction AND using it to access memory it can’t see.
So why do you sometimes see DBRA anyway, even if breaking these rules? Because if compile-time constants are involved in the loop count and initial loop conditions, the compiler can solve and eliminate the barriers listed above. But usually, in real code, something gets in the way of this and you end up with some subq-cmp/tst-bpl junk instead.
SOLUTIONS
There are at least two reliable solutions for GCC-generated DBRA loops without register starvation. One for signed, one for unsigned loop counters. Generally, you should prefer the signed version, for reasons other than the DBRA itself (e.g. if the counter is somehow needed to index something inside the loop, signed indexes usually match addressing modes without conversion).
typedef signed short s16;
typedef unsigned short u16;
// signed
{
s16 loop = count-1;
do
{
...do_something;
} while (--loop!= -1);
}
// unsigned
{
u16 loop = count-1;
do
{
...do_something;
} while (--loop!= 0xffff);
}
Final notes:
Complex loop code burning lots of registers might not issue a DBRA anyway if the loop counter is spilled to the stack - in which case you’re better off with a different loop construct. But if it can issue a loop register+DBRA, then the above code will make that happen.
It is still possible to arrange solutions using ‘for’, but the compiler needs ‘hinted’ that 1-or-more iterations will execute. YMMV. It’s clearer to implement this construct with ‘do/while’.
This is not a GCC-specific thing. It’s a problem with the C language and expressing constraints clearly enough for translation to specific assembly. The solution offered is however somewhat m68k-specific. It may work on other types of machine but YMMV - each requires a ‘fit’ for its own dec/loop instructions.
There are some deeper issues with GCC and loop codegen which can occur if DBRA is not emitted - resulting in some very ugly things in the disasm. This has to do with certain optimiser passes (e.g. tree-vrp) which are increasingly targeted at non-m68k machines, and can adversely affect m68k codegen. If you encourage DBRA to be emitted you won’t run into these effects so much - it’s another reason to take care…
If you want to try cooking other variations for the stuff above, have a play with m68k the compiler explorer at http://brownbot.mooo.com !