Basic Blaster Usage · Beyond Brown

Blast processing got nothing on this!

v1.01

v1.02, Oct 23rd 2018 - Minor edits

A tiny guide by Excellence in Art, written to remind himself how all this stuff fits together in the context of demo/game development.

Many thanks to GGN/KÜA and dml/TPT for helping out with this document. Any errors should be attributed to Excellence in Art. Also thanks to Mr Black for pushing me to make this code (relatively) easy to use.

Intro

Some people know the blaster chip in the STE (and a few other 16/32 bit computers from Atari) as a “blitter”, but we really should call it a blaster. The reasons are as follow:

Internal Atari, Inc. documents refer to it as such
It’s my doc, I can call it what I want!!11!oneone
It’s just much more fun this way

However, in order to piss people off, the noun used to describe a blaster operation will be “a blit”, and the verb for what the blaster does is “to blit”.

So, what does the blaster do? At the most basic level, it copies data from one place in memory to another. The big deal is not, as we may think, that it can do it a lot faster than the CPU - it can’t (both are limited by the system’s bandwidth). The big deal is that there are a number of operations it can do while copying, like shifting, masking etc. Basically a whole bunch of logic operations.

The blitter also has a neat little “halftone RAM” buffer of 32 bytes, which we can use for certain types of masking operations, but that’s out of the scope of this document.

Basic concepts

The blaster blits from source to destination. The source is an address in RAM, as is the destination.

The blaster was clearly designed as a graphics tool. The parameters we write to it have names like “source x increment”, “destination y increment”, “x count” etc, and the blaster handles input as words arranged into lines.

To perform a blit, we write the appropriate values into the blaster’s registers (located at $ffff8a00 - $ffff8a3d). The last value we write is the “start” command, after which the blaster starts blitting away.

IMPORTANT: To start the blaster, we only need to set bit 7 of register $ffff8a3c to 1, so it might be tempting to do that with a bset instruction. Don’t. For extremely technical reasons, starting the blaster with a bset instruction can cause the blaster to do weird things. Start the blaster with a Read-Modify-Write instruction, for example a move.b instruction. Note that this does not apply to a restart/resume of a previously started blit.

SHARED mode or HOG mode?

The blitter has two modes of operation. In the original Atari documentation they’re called “blit mode” and “hog mode”, but for sanity’s sake I will refer to them as “shared mode” and “hog mode” in this document.

In hog mode, the blitter takes over the bus. The CPU is still running, but since it can’t access RAM, not even interrupts will execute until the blaster is done. For this reason, hog mode is the faster of the two modes.

Shared mode solves many of the problems of hog mode by passing control between the CPU and the blaster in 256-cycle chunks. This has two major side effects:

The blit will take twice as long to finish.
Our code will have to be prepared for the fact that the CPU regains control of the system very soon after the start of the blit, and if we have tasks that depend on the blaster being done, we have to poll the blaster to find out if it’s done.

There is a decent cure for the speed decrease, however; if we start the blaster in shared mode, but then set the CPU up to manually restart the blaster over and over, we end up with something like 90% of the speed we get in hog mode, but things like raster interrupts are still possible (but with lower resolution, because if you’re unlucky, the blitter will delay it 256 cycles).

For more on this subject, see “Example #4: Clearing in shared mode” later in this document.

Halftone operations

As mentioned earlier in this document, we won’t be going into details regarding the blaster’s halftone RAM. However, there is one setting we need to go over:

$ffff8a3a - Halftone operation (HOP)

This register controls how the halftone RAM and the source work together:

0 = all bits are 1
1 = all bits taken from the halftone RAM's pattern
2 = all bits taken from source
3 = source and halftone RAM are combined with an AND operation

For the remainder of this document, when I refer to “source”, I technically mean the result of these halftone operations. If you set the HOP register to 2, that’s always the case.

Operations

The blaster can perform a lot of different logical operations, not least the shifting (rotating) of data (which saves us from preshifting sprites in situations where we need speed). The blaster calls this skewing, and it is handled by the lowest four bits of $ffff8a3d. More about this later.

The rest of the logic operations are handled by:

$ffff8a3b - Operation (OP)

There are 16 different operations available, and I will only go over the ones I think are the most useful for the beginner (for more information, see the “References” section later in this document).

The first operation is very useful when clearing:

00 - all bits in dest set to 0

The next two are useful for masking sprites (the inclusion of the NOT can be a life-saver if your graphics program outputs the mask as negative!):

01 - source AND dest
02 - source AND NOT dest

Then of course there’s an operation for just copying the data:

03 - source

And finally, XOR and OR, both of which can be useful for sprites (although XOR is more of a special effect):

06 - source XOR dest
07 - source OR dest

Example #1: Copying

Time for some example code. This is a little subroutine that copies 32000 bytes from (the address pointed to by) a0 to (the address pointed to by) a1:

; This code is not ready-to-assemble, it's just the bare bones needed
; for the task at hand.

copy32000:
; In: a0 - source address
;     a1 - destination address
	move.l a0,$ffff8a24    ; source address
	move.l a1,$ffff8a32    ; dest address
                         
	move.w #80,$ffff8a36   ; x count (number of words per line)
	move.w #200,$ffff8a38  ; y count (number of lines)
                         
	move.w #2,$ffff8a20    ; source x increment
	move.w #2,$ffff8a22    ; source y increment
	move.w #2,$ffff8a2e    ; dest x increment
	move.w #2,$ffff8a30    ; dest y increment
                         
  clr.b $ffff8a3d        ; no skewing
                         
	move.w #-1,$ffff8a28   ; endmask 1
	move.w #-1,$ffff8a2a   ; endmask 2
	move.w #-1,$ffff8a2c   ; endmask 3
                         
	move.b #2,$ffff8a3a    ; HOP = source
	move.b #3,$ffff8a3b    ; OP = source

	move.b #%11000000,$ffff8a3c  ; bit 6=hog mode, bit 7=start blaster
	rts

Source and dest addresses are pretty self-explanatory, but let’s have a look at the rest.

Remember, the blaster was made for graphics, so it’s all organized around lines of words. The blaster thinks of a line as a sequence of words. Between each word, it adds the “x increment” value, and after the last word for the line, it adds the “y increment” value. Note that after the last word, it does NOT add an “x increment”.

The “x count” determines the number of words in a line, and since the ST’s standard 320x200 screen has 160 bytes per line, 80 words is what we want.

…and since the screen is 320x200, 200 lines it is.

Since this routine copies every single word, the increments in both source and destination are 2 bytes, in both x and y.

The $ffff8a3d register we just set to zero, mainly because we don’t want any skewing.

We don’t want any masking here, so all three endmasks are set to -1 ($ffff).

Next we set the HOP and OP registers to “source”, because we’re just plain copying from source to destination.

And finally we start the blaster by setting bit 7 in $ffff8a3b to 1. We also want to do this in hog mode, which is why we also set bit 6 to 1.

Example #2: Clearing

Now for something a little more interesting: this little snippet of code clears out a single bitplane, i e it zeroes out every 4th word in every line:

; This code is not ready-to-assemble, it's just the bare bones needed
; for the task at hand.

clear1bitplane:
; In: a0 - destination address
	move.l a0,$ffff8a24    ; source address
	move.l a0,$ffff8a32    ; dest address
                         
	move.w #20,$ffff8a36   ; x count (number of words per line)
	move.w #200,$ffff8a38  ; y count (number of lines)
                         
	move.w #0,$ffff8a20    ; source x increment
	move.w #0,$ffff8a22    ; source y increment
	move.w #8,$ffff8a2e    ; dest x increment
	move.w #8,$ffff8a30    ; dest y increment
                         
  clr.b $ffff8a3d        ; no skewing
                         
	move.w #-1,$ffff8a28   ; endmask 1
	move.w #-1,$ffff8a2a   ; endmask 2
	move.w #-1,$ffff8a2c   ; endmask 3
                         
	move.b #2,$ffff8a3a    ; HOP = source
	move.b #0,$ffff8a3b    ; OP = all bits in dest set to 0

	move.b #%11000000,$ffff8a3c  ; bit 6=hog mode, bit 7=start blaster
	rts

So, what’s going on here? The source and destination address are the same address? Yes, because the OP is “all bits in dest set to 0”, so it doesn’t matter what’s in the source address - the blaster will never use that actual value anyway (technically, the blaster is very likely to read the value, so it should probably be a valid address!).

For the same reasons, “source x increment” and “source y increment” are both set to 0.

The “x count” register is set to 20, because we are actually only modifying every 4th register (80/4=20), and for that reason the destination increment registers are changed to reflect that.

Sprites, skewing and endmasks

But of course, the reason you’re reading this document is to get to the real fun of blaster development: sprites!

The largest problem doing software sprites on the ST computers (because there are no hardware sprites on these machines) is speed. The standard solution is of course to trade RAM for speed, i e precalcing 16 different versions of each sprite, so we don’t have to rotate all the bitplanes in realtime.

And the blaster can solve this problem for us. Using the lowest four bits of register $ffff8a3d, the blaster can shift the data right up to 15 bits in a single operation.

From a technical standpoint, the blaster sees the source data as one large stream of bits, and it writes them to the target in chunks of 16. The skew value simply rotates the bitstream, pushing the bits that exit on the right back in on the left.

Before we get to the actual code, let’s have a quick reminder of what we need to draw sprites on the ST:

First, we have to mask off the pixels where we want to draw the sprite. For this purpose, we need a mask where the pixels that make up the sprite are represented by 0’s, and the rest are 1’s. We then AND the bitplanes in the screen with this mask, effectively erasing all bits where the sprite will be drawn.
When then take the actual sprite data and OR that out to the screen.

How you organize this data is up to you, but for this document I will be using the following data structure:

Header
-------
BLASTER_SPRITE_HEADER_WORDWIDTH             word
BLASTER_SPRITE_HEADER_PIXELHEIGHT           word
BLASTER_SPRITE_HEADER_BITPLANES             word
BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES   word
BLASTER_SPRITE_HEADER_MASKDATA_POINTER      longword, points to [maskdata]
BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER    longword, points to [spritedata, bpl 0]

Data
-----
[maskdata]
[spritedata, bpl 0]
[spritedata, bpl 1]
[spritedata, bpl 2]
[spritedata, bpl 3]

For this example, we’re assuming a fictional sprite, 38 pixels wide and 51 pixels high. Since the blaster works with words, the sprite width must be padded out to be divisible by 16 in width, so we end up with a 48x51 sprite.

This means a single bitplane of the sprite is 6x51 bytes (48 pixels = 3 words = 6 bytes) = 306 bytes. The mask is the same size as a single bitplane.

There’s only one catch in all this… When we skew the blaster operation, we’re going to get “junk bits” both to the left and the right of the sprite. And this is where the three “endmask” registers in the blaster come in!

But the explanation is going to have to wait until after the next code example:

Example #3: Sprite routine

; This code is not ready-to-assemble, it's just the bare bones needed
; for the task at hand.
; Based in part on code by Evil/DHS - Thank him, not me!

;----------------------------------------------------------
;-- rmac code for sprite header struct

  .abs
BLASTER_SPRITE_HEADER_WORDWIDTH:           ds.w 1
BLASTER_SPRITE_HEADER_PIXELHEIGHT:         ds.w 1
BLASTER_SPRITE_HEADER_BITPLANES:           ds.w 1
BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES: ds.w 1
BLASTER_SPRITE_HEADER_MASKDATA_POINTER:    ds.l 1
BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER:  ds.l 1
  .68000
BLASTER_SPRITE_HEADER_STRUCT_SIZE equ ^^abscount

;-- rmac code for sprite header struct
;----------------------------------------------------------
;-- devpac code for sprite header struct

  rsreset
BLASTER_SPRITE_HEADER_WORDWIDTH            rs.w 1
BLASTER_SPRITE_HEADER_PIXELHEIGHT          rs.w 1
BLASTER_SPRITE_HEADER_BITPLANES            rs.w 1
BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES  rs.w 1
BLASTER_SPRITE_HEADER_MASKDATA_POINTER     rs.l 1
BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER   rs.l 1
BLASTER_SPRITE_HEADER_STRUCT_SIZE equ __RS

;-- devpac code for sprite header struct
;----------------------------------------------------------

BLASTER_HOP_ONES                    equ %00000000
BLASTER_HOP_HALFTONE                equ %00000001
BLASTER_HOP_SOURCE                  equ %00000010
BLASTER_HOP_SOURCE_AND_HALFTONE     equ %00000011

BLASTER_OP_SOURCE                   equ %00000011
BLASTER_OP_SOURCE_AND_TARGET        equ %00000001
BLASTER_OP_SOURCE_AND_NOT_TARGET    equ %00000010
BLASTER_OP_SOURCE_OR_TARGET         equ %00000111
BLASTER_OP_SOURCE_XOR_TARGET        equ %00000110
BLASTER_OP_SOURCE_NOT_TARGET        equ %00000100
BLASTER_OP_ZEROES                   equ %00000000
BLASTER_OP_ONES                     equ %00001111

BLASTER_COMMAND_START_HOG_MODE      equ %11000000
BLASTER_COMMAND_START_SHARED_MODE   equ %10000000

SCREEN_LINE_WIDTH_BYTES equ 160
SCREEN_BITPLANES equ 4

;----------------------------------------------------------    

lea sprite_header,a0
move.l screenaddress,a1
move.w #12,d0  ; X position
move.w #55,d1  ; Y position
bsr sprite_draw

;----------------------------------------------------------

sprite_draw:
; In: a0.l - pointer to sprite header struct
;     a1.l - pointer to screen address
;     d0.w - X
;     d1.w - Y
  move.l a0,a6
  move.l BLASTER_SPRITE_HEADER_MASKDATA_POINTER(a6),a0
  move.l BLASTER_SPRITE_HEADER_SPRITEDATA_POINTER(a6),a2
  move.w BLASTER_SPRITE_HEADER_BITPLANE_SIZE_BYTES(a6),d5
  move.w BLASTER_SPRITE_HEADER_BITPLANES(a6),d2
  move.w BLASTER_SPRITE_HEADER_WORDWIDTH(a6),d3
  move.w BLASTER_SPRITE_HEADER_PIXELHEIGHT(a6),d4
  ; x
  move.w d0,d7
  and.w #$fff0,d7
  lsr.w #1,d7  ; d7 is now an offset to the correct word in the line
  add.w d7,a1
  and.w #$f,d0  ; d0 is now the skew value
  ; y
  ext.l d1
  add.l d1,d1  
  add.l d1,d1
  lea ytable,a5
  add.l (a5,d1),a1  ; a1 now points to the correct screen position

  sub.w #1,d2
  ext.l d2  ; adjust d2 for dbra loop of bitplanes
  move.l d2,d6

  move.w #SCREEN_LINE_WIDTH_BYTES,d1
  move.w d3,d7
  lsl.w #3,d7
  sub.w d7,d1
  add.w #8,d1  ; dest y increment done

  tst.b d0      ; if skew value is 0, that's one codepath...
  bne .skewing  ; if not, that's another.

  ; no skewing
  move.w #-1,d2  ; left endmask
  move.w d2,d7   ; right endmask
  move.w #2,a4   ; source y inc
  bra .call_blaster

.skewing:
  move.w d0,d7
  ext.l d7
  add.l d7,d7
  lea blaster_sprite_leftmasks,a5
  move.w (a5,d7),d2
  lea blaster_sprite_rightmasks,a5
  move.w (a5,d7),d7

  add.w #1,d3   ; one more xcount
  sub.w #8,d1   ; adjust dest y inc accordingly
  move.w #0,a4  ; source y inc

.call_blaster:
; do the blits
  move.b d0,$ffff8a3d  ; set skew

  move.l d6,-(sp)
  ; mask using blaster
  move.w #SCREEN_BITPLANES-1,d6
.mask_blit_loop:
  move.w #2,$ffff8a20   ;source x inc
  move.w a4,$ffff8a22   ;source y inc
  move.l a0,$ffff8a24   ;source address
  move.w d2,$ffff8a28   ;endmask 0
  move.w #-1,$ffff8a2a  ;endmask 1
  move.w d7,$ffff8a2c   ;endmask 2
  move.w #8,$ffff8a2e   ;dest x inc
  move.w d1,$ffff8a30   ;dest y inc
  move.l a1,$ffff8a32   ;destination address
  move.w d3,$ffff8a36   ;x count (n words per line to copy)
  move.w d4,$ffff8a38   ;y count (n lines to copy)
  move.b #BLASTER_HOP_SOURCE,$ffff8a3a              ; halftone operation
  move.b #BLASTER_OP_SOURCE_AND_TARGET,$ffff8a3b    ; operation
  move.b #BLASTER_COMMAND_START_HOG_MODE,$ffff8a3c  ; start blaster
  add.w #2,a1
  dbra d6,.mask_blit_loop
  sub.w #8,a1

  ; draw sprite using blaster
  move.l (sp)+,d6
.sprite_blit_loop:
  move.w #2,$ffff8a20   ;source x inc
  move.w a4,$ffff8a22   ;source y inc
  move.l a2,$ffff8a24   ;source address
  move.w d2,$ffff8a28   ;endmask 0
  move.w #-1,$ffff8a2a  ;endmask 1
  move.w d7,$ffff8a2c   ;endmask 2
  move.w #8,$ffff8a2e   ;dest x inc
  move.w d1,$ffff8a30   ;dest y inc
  move.l a1,$ffff8a32   ;destination address
  move.w d3,$ffff8a36   ;x count (n words per line to copy)
  move.w d4,$ffff8a38   ;y count (n lines to copy)
  move.b #BLASTER_HOP_SOURCE,$ffff8a3a              ; halftone operation
  move.b #BLASTER_OP_SOURCE_OR_TARGET,$ffff8a3b     ; operation
  move.b #BLASTER_COMMAND_START_HOG_MODE,$ffff8a3c  ; start blaster
  add.w #2,a1
  add.w d5,a2
  dbra d6,.sprite_blit_loop
  rts

;----------------------------------------------------------

ytable:
val set 0
  rept 200
  dc.l val
val set val+SCREEN_LINE_WIDTH_BYTES
  endr

sprite_leftmasks:
  dc.w %1111111111111111
  dc.w %0111111111111111
  dc.w %0011111111111111
  dc.w %0001111111111111
  dc.w %0000111111111111
  dc.w %0000011111111111
  dc.w %0000001111111111
  dc.w %0000000111111111
  dc.w %0000000011111111
  dc.w %0000000001111111
  dc.w %0000000000111111
  dc.w %0000000000011111
  dc.w %0000000000001111
  dc.w %0000000000000111
  dc.w %0000000000000011
  dc.w %0000000000000001


sprite_rightmasks:
  dc.w %0000000000000000
  dc.w %1000000000000000
  dc.w %1100000000000000
  dc.w %1110000000000000
  dc.w %1111000000000000
  dc.w %1111100000000000
  dc.w %1111110000000000
  dc.w %1111111000000000
  dc.w %1111111100000000
  dc.w %1111111110000000
  dc.w %1111111111000000
  dc.w %1111111111100000
  dc.w %1111111111110000
  dc.w %1111111111111000
  dc.w %1111111111111100
  dc.w %1111111111111110

So, the endmasks. Which endmasks are used depends on the width of the blit:

Blit width
(in words)  Endmasks
----------  ---------------
        1    0
        2    0, 2
        3    0, 1, 2
        4    0, 1, 1, 2
        5    0, 1, 1, 1, 2
      ...    ...

Another way of putting it would be that endmask 0 masks off the first word of the line, endmask 2 masks off the last word of the line, and endmask 1 masks off all the words between them.

Using the two tables “sprite_leftmasks” and “sprite_rightmasks”, the above code sets endmask 0 and endmask 2 to mask out the “junk bits” the blaster will be rotating in when we set a non-zero skew value.

Blitting with interrupts

In hog mode, the blaster will take over all bus bandwidth. Shared mode, while allowing interrupts to trigger by passing control back and forth between the CPU and the blaster, is significantly slower.

However, it turns out it’s entirely possible to send the blaster back to work faster. By putting the CPU in a loop where it restarts the blaster, interrupts can still be served (albeit with a delay of up to 256 cycles).

Example #4: Clearing in shared mode

This is the code from example #2, but running in shared mode with manual restarting of the blaster.

; This code is not ready-to-assemble, it's just the bare bones needed
; for the task at hand.

clear1bitplane_sharedmode:
; In: a0 - destination address
	move.l a0,$ffff8a24   ; source address
	move.l a0,$ffff8a32   ; dest address

	move.w #20,$ffff8a36  ; x count (number of words per line)
	move.w #200,$ffff8a38 ; y count (number of lines)

	move.w #0,$ffff8a20   ; source x increment
	move.w #0,$ffff8a22   ; source y increment
	move.w #8,$ffff8a2e   ; dest x increment
	move.w #8,$ffff8a30   ; dest y increment

  clr.b $ffff8a3d       ; no skewing

	move.w #-1,$ffff8a28  ; endmask 1
	move.w #-1,$ffff8a2a  ; endmask 2
	move.w #-1,$ffff8a2c  ; endmask 3

	move.b #2,$ffff8a3a   ; HOP = source
	move.b #0,$ffff8a3b   ; OP = all bits in dest set to 0

	move.b #%10000000,$ffff8a3c  ; bit 6=hog mode, bit 7=start blaster

	move.l #$ffff8a3c,a1
	moveq #7,d7
	.retrigger:
	  bset.b d7,(a1)    ; restart blaster, sets CCR flags
	  nop               ; nop to allow interrupt to trigger
	  bne.s .retrigger  ; if the blaster wasn't done, loop back
	rts

References

http://www.atari-wiki.com/index.php/Blitter_manual

http://paradox.atari.org/files/BLIT_FAQ.TXT