v1.02, Oct 23rd 2018 - Minor edits
A tiny guide by Excellence in Art, written to remind himself how all this stuff fits together in the context of demo/game development.
Many thanks to GGN/KÜA and dml/TPT for helping out with this document. Any errors should be attributed to Excellence in Art. Also thanks to Mr Black for pushing me to make this code (relatively) easy to use.
Some people know the blaster chip in the STE (and a few other 16⁄32 bit computers from Atari) as a “blitter”, but we really should call it a blaster. The reasons are as follow:
- Internal Atari, Inc. documents refer to it as such
- It’s my doc, I can call it what I want!!11!oneone
- It’s just much more fun this way
However, in order to piss people off, the noun used to describe a blaster operation will be “a blit”, and the verb for what the blaster does is “to blit”.
So, what does the blaster do? At the most basic level, it copies data from one place in memory to another. The big deal is not, as we may think, that it can do it a lot faster than the CPU - it can’t (both are limited by the system’s bandwidth). The big deal is that there are a number of operations it can do while copying, like shifting, masking etc. Basically a whole bunch of logic operations.
The blitter also has a neat little “halftone RAM” buffer of 32 bytes, which we can use for certain types of masking operations, but that’s out of the scope of this document.
The blaster blits from source to destination. The source is an address in RAM, as is the destination.
The blaster was clearly designed as a graphics tool. The parameters we write to it have names like “source x increment”, “destination y increment”, “x count” etc, and the blaster handles input as words arranged into lines.
To perform a blit, we write the appropriate values into the blaster’s registers (located at $ffff8a00 - $ffff8a3d). The last value we write is the “start” command, after which the blaster starts blitting away.
IMPORTANT: To start the blaster, we only need to set bit 7 of register $ffff8a3c to 1, so it might be tempting to do that with a bset instruction. Don’t. For extremely technical reasons, starting the blaster with a bset instruction can cause the blaster to do weird things. Start the blaster with a Read-Modify-Write instruction, for example a move.b instruction. Note that this does not apply to a restart/resume of a previously started blit.
SHARED mode or HOG mode?
The blitter has two modes of operation. In the original Atari documentation they’re called “blit mode” and “hog mode”, but for sanity’s sake I will refer to them as “shared mode” and “hog mode” in this document.
In hog mode, the blitter takes over the bus. The CPU is still running, but since it can’t access RAM, not even interrupts will execute until the blaster is done. For this reason, hog mode is the faster of the two modes.
Shared mode solves many of the problems of hog mode by passing control between the CPU and the blaster in 256-cycle chunks. This has two major side effects:
- The blit will take twice as long to finish.
- Our code will have to be prepared for the fact that the CPU regains control of the system very soon after the start of the blit, and if we have tasks that depend on the blaster being done, we have to poll the blaster to find out if it’s done.
There is a decent cure for the speed decrease, however; if we start the blaster in shared mode, but then set the CPU up to manually restart the blaster over and over, we end up with something like 90% of the speed we get in hog mode, but things like raster interrupts are still possible (but with lower resolution, because if you’re unlucky, the blitter will delay it 256 cycles).
For more on this subject, see “Example #4: Clearing in shared mode” later in this document.
As mentioned earlier in this document, we won’t be going into details regarding the blaster’s halftone RAM. However, there is one setting we need to go over:
$ffff8a3a - Halftone operation (HOP)
This register controls how the halftone RAM and the source work together:
0 = all bits are 1 1 = all bits taken from the halftone RAM's pattern 2 = all bits taken from source 3 = source and halftone RAM are combined with an AND operation
For the remainder of this document, when I refer to “source”, I technically mean the result of these halftone operations. If you set the HOP register to 2, that’s always the case.
The blaster can perform a lot of different logical operations, not least the shifting (rotating) of data (which saves us from preshifting sprites in situations where we need speed). The blaster calls this skewing, and it is handled by the lowest four bits of $ffff8a3d. More about this later.
The rest of the logic operations are handled by:
$ffff8a3b - Operation (OP)
There are 16 different operations available, and I will only go over the ones I think are the most useful for the beginner (for more information, see the “References” section later in this document).
The first operation is very useful when clearing:
00 - all bits in dest set to 0
The next two are useful for masking sprites (the inclusion of the NOT can be a life-saver if your graphics program outputs the mask as negative!):
01 - source AND dest 02 - source AND NOT dest
Then of course there’s an operation for just copying the data:
03 - source
And finally, XOR and OR, both of which can be useful for sprites (although XOR is more of a special effect):
06 - source XOR dest 07 - source OR dest
Example #1: Copying
Time for some example code. This is a little subroutine that copies 32000 bytes from (the address pointed to by) a0 to (the address pointed to by) a1:
; This code is not ready-to-assemble, it's just the bare bones needed ; for the task at hand. copy32000: ; In: a0 - source address ; a1 - destination address move.l a0,$ffff8a24 ; source address move.l a1,$ffff8a32 ; dest address move.w #80,$ffff8a36 ; x count (number of words per line) move.w #200,$ffff8a38 ; y count (number of lines) move.w #2,$ffff8a20 ; source x increment move.w #2,$ffff8a22 ; source y increment move.w #2,$ffff8a2e ; dest x increment move.w #2,$ffff8a30 ; dest y increment clr.b $ffff8a3d ; no skewing move.w #-1,$ffff8a28 ; endmask 1 move.w #-1,$ffff8a2a ; endmask 2 move.w #-1,$ffff8a2c ; endmask 3 move.b #2,$ffff8a3a ; HOP = source move.b #3,$ffff8a3b ; OP = source move.b #%11000000,$ffff8a3c ; bit 6=hog mode, bit 7=start blaster rts
Source and dest addresses are pretty self-explanatory, but let’s have a look at the rest.
Remember, the blaster was made for graphics, so it’s all organized around lines of words. The blaster thinks of a line as a sequence of words. Between each word, it adds the “x increment” value, and after the last word for the line, it adds the “y increment” value. Note that after the last word, it does NOT add an “x increment”.
The “x count” determines the number of words in a line, and since the ST’s standard 320x200 screen has 160 bytes per line, 80 words is what we want.
…and since the screen is 320x200, 200 lines it is.
Since this routine copies every single word, the increments in both source and destination are 2 bytes, in both x and y.
The $ffff8a3d register we just set to zero, mainly because we don’t want any skewing.
We don’t want any masking here, so all three endmasks are set to -1 ($ffff).
Next we set the HOP and OP registers to “source”, because we’re just plain copying from source to destination.
And finally we start the blaster by setting bit 7 in $ffff8a3b to 1. We also want to do this in hog mode, which is why we also set bit 6 to 1.
Example #2: Clearing
Now for something a little more interesting: this little snippet of code clears out a single bitplane, i e it zeroes out every 4th word in every line:
; This code is not ready-to-assemble, it's just the bare bones needed ; for the task at hand. clear1bitplane: ; In: a0 - destination address move.l a0,$ffff8a24 ; source address move.l a0,$ffff8a32 ; dest address move.w #20,$ffff8a36 ; x count (number of words per line) move.w #200,$ffff8a38 ; y count (number of lines) move.w #0,$ffff8a20 ; source x increment move.w #0,$ffff8a22 ; source y increment move.w #8,$ffff8a2e ; dest x increment move.w #8,$ffff8a30 ; dest y increment clr.b $ffff8a3d ; no skewing move.w #-1,$ffff8a28 ; endmask 1 move.w #-1,$ffff8a2a ; endmask 2 move.w #-1,$ffff8a2c ; endmask 3 move.b #2,$ffff8a3a ; HOP = source move.b #0,$ffff8a3b ; OP = all bits in dest set to 0 move.b #%11000000,$ffff8a3c ; bit 6=hog mode, bit 7=start blaster rts
So, what’s going on here? The source and destination address are the same address? Yes, because the OP is “all bits in dest set to 0”, so it doesn’t matter what’s in the source address - the blaster will never use that actual value anyway (technically, the blaster is very likely to read the value, so it should probably be a valid address!).
For the same reasons, “source x increment” and “source y increment” are both set to 0.
The “x count” register is set to 20, because we are actually only modifying every 4th register (80⁄4=20), and for that reason the destination increment registers are changed to reflect that.
Sprites, skewing and endmasks
But of course, the reason you’re reading this document is to get to the real fun of blaster development: sprites!
The largest problem doing software sprites on the ST computers (because there are no hardware sprites on these machines) is speed. The standard solution is of course to trade RAM for speed, i e precalcing 16 different versions of each sprite, so we don’t have to rotate all the bitplanes in realtime.
And the blaster can solve this problem for us. Using the lowest four bits of register $ffff8a3d, the blaster can shift the data right up to 15 bits in a single operation.
From a technical standpoint, the blaster sees the source data as one large stream of bits, and it writes them to the target in chunks of 16. The skew value simply rotates the bitstream, pushing the bits that exit on the right back in on the left.
Before we get to the actual code, let’s have a quick reminder of what we need to draw sprites on the ST:
- First, we have to mask off the pixels where we want to draw the sprite. For this purpose, we need a mask where the pixels that make up the sprite are represented by 0’s, and the rest are 1’s. We then AND the bitplanes in the screen with this mask, effectively erasing all bits where the sprite will be drawn.
- When then take the actual sprite data and OR that out to the screen.
How you organize this data is up to you, but for this document I will be using the following data structure:
Header ------- BLASTER_SPRITE_HEADER_WORDWIDTH word BLASTER_SPRITE_HEADER_PIXELHEIGHT word BLASTER_SPRITE_HEADER_BITPLANES word BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES word BLASTER_SPRITE_HEADER_MASKDATA_POINTER longword, points to [maskdata] BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER longword, points to [spritedata, bpl 0] Data ----- [maskdata] [spritedata, bpl 0] [spritedata, bpl 1] [spritedata, bpl 2] [spritedata, bpl 3]
For this example, we’re assuming a fictional sprite, 38 pixels wide and 51 pixels high. Since the blaster works with words, the sprite width must be padded out to be divisible by 16 in width, so we end up with a 48x51 sprite.
This means a single bitplane of the sprite is 6x51 bytes (48 pixels = 3 words = 6 bytes) = 306 bytes. The mask is the same size as a single bitplane.
There’s only one catch in all this… When we skew the blaster operation, we’re going to get “junk bits” both to the left and the right of the sprite. And this is where the three “endmask” registers in the blaster come in!
But the explanation is going to have to wait until after the next code example:
Example #3: Sprite routine
; This code is not ready-to-assemble, it's just the bare bones needed ; for the task at hand. ; Based in part on code by Evil/DHS - Thank him, not me! ;---------------------------------------------------------- ;-- rmac code for sprite header struct .abs BLASTER_SPRITE_HEADER_WORDWIDTH: ds.w 1 BLASTER_SPRITE_HEADER_PIXELHEIGHT: ds.w 1 BLASTER_SPRITE_HEADER_BITPLANES: ds.w 1 BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES: ds.w 1 BLASTER_SPRITE_HEADER_MASKDATA_POINTER: ds.l 1 BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER: ds.l 1 .68000 BLASTER_SPRITE_HEADER_STRUCT_SIZE equ ^^abscount ;-- rmac code for sprite header struct ;---------------------------------------------------------- ;-- devpac code for sprite header struct rsreset BLASTER_SPRITE_HEADER_WORDWIDTH rs.w 1 BLASTER_SPRITE_HEADER_PIXELHEIGHT rs.w 1 BLASTER_SPRITE_HEADER_BITPLANES rs.w 1 BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES rs.w 1 BLASTER_SPRITE_HEADER_MASKDATA_POINTER rs.l 1 BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER rs.l 1 BLASTER_SPRITE_HEADER_STRUCT_SIZE equ __RS ;-- devpac code for sprite header struct ;---------------------------------------------------------- BLASTER_HOP_ONES equ %00000000 BLASTER_HOP_HALFTONE equ %00000001 BLASTER_HOP_SOURCE equ %00000010 BLASTER_HOP_SOURCE_AND_HALFTONE equ %00000011 BLASTER_OP_SOURCE equ %00000011 BLASTER_OP_SOURCE_AND_TARGET equ %00000001 BLASTER_OP_SOURCE_AND_NOT_TARGET equ %00000010 BLASTER_OP_SOURCE_OR_TARGET equ %00000111 BLASTER_OP_SOURCE_XOR_TARGET equ %00000110 BLASTER_OP_SOURCE_NOT_TARGET equ %00000100 BLASTER_OP_ZEROES equ %00000000 BLASTER_OP_ONES equ %00001111 BLASTER_COMMAND_START_HOG_MODE equ %11000000 BLASTER_COMMAND_START_SHARED_MODE equ %10000000 SCREEN_LINE_WIDTH_BYTES equ 160 SCREEN_BITPLANES equ 4 ;---------------------------------------------------------- lea sprite_header,a0 move.l screenaddress,a1 move.w #12,d0 ; X position move.w #55,d1 ; Y position bsr sprite_draw ;---------------------------------------------------------- sprite_draw: ; In: a0.l - pointer to sprite header struct ; a1.l - pointer to screen address ; d0.w - X ; d1.w - Y move.l a0,a6 move.l BLASTER_SPRITE_HEADER_MASKDATA_POINTER(a6),a0 move.l BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER(a6),a2 move.w BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES(a6),d5 move.w BLASTER_SPRITE_HEADER_BITPLANES(a6),d2 move.w BLASTER_SPRITE_HEADER_WORDWIDTH(a6),d3 move.w BLASTER_SPRITE_HEADER_PIXELHEIGHT(a6),d4 ; x move.w d0,d7 and.w #$fff0,d7 lsr.w #1,d7 ; d7 is now an offset to the correct word in the line add.w d7,a1 and.w #$f,d0 ; d0 is now the skew value ; y ext.l d1 add.l d1,d1 add.l d1,d1 lea ytable,a5 add.l (a5,d1),a1 ; a1 now points to the correct screen position sub.w #1,d2 ext.l d2 ; adjust d2 for dbra loop of bitplanes move.l d2,d6 move.w #SCREEN_LINE_WIDTH_BYTES,d1 move.w d3,d7 lsl.w #3,d7 sub.w d7,d1 add.w #8,d1 ; dest y increment done tst.b d0 ; if skew value is 0, that's one codepath... bne .skewing ; if not, that's another. ; no skewing move.w #-1,d2 ; left endmask move.w d2,d7 ; right endmask move.w #2,a4 ; source y inc bra .call_blaster .skewing: move.w d0,d7 ext.l d7 add.l d7,d7 lea blaster_sprite_leftmasks,a5 move.w (a5,d7),d2 lea blaster_sprite_rightmasks,a5 move.w (a5,d7),d7 add.w #1,d3 ; one more xcount sub.w #8,d1 ; adjust dest y inc accordingly move.w #0,a4 ; source y inc .call_blaster: ; do the blits move.b d0,$ffff8a3d ; set skew move.l d6,-(sp) ; mask using blaster move.w #SCREEN_BITPLANES-1,d6 .mask_blit_loop: move.w #2,$ffff8a20 ;source x inc move.w a4,$ffff8a22 ;source y inc move.l a0,$ffff8a24 ;source address move.w d2,$ffff8a28 ;endmask 0 move.w #-1,$ffff8a2a ;endmask 1 move.w d7,$ffff8a2c ;endmask 2 move.w #8,$ffff8a2e ;dest x inc move.w d1,$ffff8a30 ;dest y inc move.l a1,$ffff8a32 ;destination address move.w d3,$ffff8a36 ;x count (n words per line to copy) move.w d4,$ffff8a38 ;y count (n lines to copy) move.b #BLASTER_HOP_SOURCE,$ffff8a3a ; halftone operation move.b #BLASTER_OP_SOURCE_AND_TARGET,$ffff8a3b ; operation move.b #BLASTER_COMMAND_START_HOG_MODE,$ffff8a3c ; start blaster add.w #2,a1 dbra d6,.mask_blit_loop sub.w #8,a1 ; draw sprite using blaster move.l (sp)+,d6 .sprite_blit_loop: move.w #2,$ffff8a20 ;source x inc move.w a4,$ffff8a22 ;source y inc move.l a2,$ffff8a24 ;source address move.w d2,$ffff8a28 ;endmask 0 move.w #-1,$ffff8a2a ;endmask 1 move.w d7,$ffff8a2c ;endmask 2 move.w #8,$ffff8a2e ;dest x inc move.w d1,$ffff8a30 ;dest y inc move.l a1,$ffff8a32 ;destination address move.w d3,$ffff8a36 ;x count (n words per line to copy) move.w d4,$ffff8a38 ;y count (n lines to copy) move.b #BLASTER_HOP_SOURCE,$ffff8a3a ; halftone operation move.b #BLASTER_OP_SOURCE_OR_TARGET,$ffff8a3b ; operation move.b #BLASTER_COMMAND_START_HOG_MODE,$ffff8a3c ; start blaster add.w #2,a1 add.w d5,a2 dbra d6,.sprite_blit_loop rts ;---------------------------------------------------------- ytable: val set 0 rept 200 dc.l val val set val+SCREEN_LINE_WIDTH_BYTES endr sprite_leftmasks: dc.w %1111111111111111 dc.w %0111111111111111 dc.w %0011111111111111 dc.w %0001111111111111 dc.w %0000111111111111 dc.w %0000011111111111 dc.w %0000001111111111 dc.w %0000000111111111 dc.w %0000000011111111 dc.w %0000000001111111 dc.w %0000000000111111 dc.w %0000000000011111 dc.w %0000000000001111 dc.w %0000000000000111 dc.w %0000000000000011 dc.w %0000000000000001 sprite_rightmasks: dc.w %0000000000000000 dc.w %1000000000000000 dc.w %1100000000000000 dc.w %1110000000000000 dc.w %1111000000000000 dc.w %1111100000000000 dc.w %1111110000000000 dc.w %1111111000000000 dc.w %1111111100000000 dc.w %1111111110000000 dc.w %1111111111000000 dc.w %1111111111100000 dc.w %1111111111110000 dc.w %1111111111111000 dc.w %1111111111111100 dc.w %1111111111111110
So, the endmasks. Which endmasks are used depends on the width of the blit:
Blit width (in words) Endmasks ---------- --------------- 1 0 2 0, 2 3 0, 1, 2 4 0, 1, 1, 2 5 0, 1, 1, 1, 2 ... ...
Another way of putting it would be that endmask 0 masks off the first word of the line, endmask 2 masks off the last word of the line, and endmask 1 masks off all the words between them.
Using the two tables “sprite_leftmasks” and “sprite_rightmasks”, the above code sets endmask 0 and endmask 2 to mask out the “junk bits” the blaster will be rotating in when we set a non-zero skew value.
Blitting with interrupts
In hog mode, the blaster will take over all bus bandwidth. Shared mode, while allowing interrupts to trigger by passing control back and forth between the CPU and the blaster, is significantly slower.
However, it turns out it’s entirely possible to send the blaster back to work faster. By putting the CPU in a loop where it restarts the blaster, interrupts can still be served (albeit with a delay of up to 256 cycles).
Example #4: Clearing in shared mode
This is the code from example #2, but running in shared mode with manual restarting of the blaster.
; This code is not ready-to-assemble, it's just the bare bones needed ; for the task at hand. clear1bitplane_sharedmode: ; In: a0 - destination address move.l a0,$ffff8a24 ; source address move.l a0,$ffff8a32 ; dest address move.w #20,$ffff8a36 ; x count (number of words per line) move.w #200,$ffff8a38 ; y count (number of lines) move.w #0,$ffff8a20 ; source x increment move.w #0,$ffff8a22 ; source y increment move.w #8,$ffff8a2e ; dest x increment move.w #8,$ffff8a30 ; dest y increment clr.b $ffff8a3d ; no skewing move.w #-1,$ffff8a28 ; endmask 1 move.w #-1,$ffff8a2a ; endmask 2 move.w #-1,$ffff8a2c ; endmask 3 move.b #2,$ffff8a3a ; HOP = source move.b #0,$ffff8a3b ; OP = all bits in dest set to 0 move.b #%10000000,$ffff8a3c ; bit 6=hog mode, bit 7=start blaster move.l #$ffff8a3c,a1 moveq #7,d7 .retrigger: bset.b d7,(a1) ; restart blaster, sets CCR flags nop ; nop to allow interrupt to trigger bne.s .retrigger ; if the blaster wasn't done, loop back rts