Introducing mode Y

Overview

This guide attempts to cover the basics of VGA programming and more specifically mode Y, which is a tweaked mode 13 offering 4 pages of video memory in 320x200 and 256 colors. In other words: a dream come true... well, for the most part. The example code provides replacement for CLS, PSET, POINT, PCOPY and SCREEN (both to set screen mode and flip pages) to use with mode Y.

Mode 13 Memory Addressing

Mode 13 has been one of QBasic programmers' favorite graphic mode due to its 256 colors and the way pixels are organized in a linear fashion in memory. In order to speed up per-pixel drawing, they began to ditch the built-in PSET instruction in favor of DEF SEG and POKE. Soon, they would write code that looked like this:

SCREEN 13                          ' Initialize mode 13.

DEF SEG = &HA000                   ' Video memory segment.
FOR y% = 0 TO 199                  ' For each line.
  FOR x% = 0 TO 319                ' For each column.
    POKE (CLNG(y%) * 320) + x%, 4  ' Write 4 to pixel offset.
  NEXT x%
NEXT y%
DEF SEG

END

Unfortunately, 16-bit signed INTEGERs cannot reach each 64,000 pixel on the screen because the positive range of that specific data type only goes from 0 to 32,767, and LONG is marginally slower than INTEGER. Thus, some relied on two different segments to divide the screen height in two: &HA000 (which points to the very first pixel on the upper left corner of the screen), and &HA7D0 (which points to the pixel located at column 0, line 100); depending on the Y position of the pixel, they would use one segment or the other, and increase the offset for each subsequent pixel as so:

SCREEN 13

FOR y% = 0 TO 199
  IF (y% < 101) THEN
    DEF SEG = &HA000
    FOR x% = 0 TO 319
      POKE (y% * 320) + x%, 4
    NEXT x%
  ELSE
    DEF SEG = &HA7D0
    FOR x% = 0 TO 319
      POKE ((y% - 100) * 320) + x%, 4
    NEXT x%
  END IF
NEXT y%

END

How is that possible exactly? Let's have a quick recap about memory segments and memory offsets. In DOS, memory addresses are usually expressed in two pairs of hexadecimal values such as "0000:0000". The left part of the address points to a segment (that's the value used in DEF SEG and returned by VARSEG), while the right part points to an offset (the value used in PEEK or POKE, returned by VARPTR or SADD for variable-length strings). If increasing the offset by one will point one byte further into memory, increasing the segment by one will point sixteen bytes further into memory. In other words, sixteen "offsets" are worth one "segment" (A000:0010 and A001:0000 point to the same memory address.) Thus, a more convenient approach would be:

SCREEN 13

DIM clr AS INTEGER
clr = 4

FOR y% = 0 TO 199              ' One full line is 320 bytes since
  DEF SEG = &HA000 + (20 * y%) ' each pixel is stored on its own
  FOR x% = 0 TO 319            ' offset. Therefore, we have to move
    POKE x%, clr               ' 320 offsets for each line, or move
  NEXT x%                      ' 320 \ 16 offsets = 20 segments.
NEXT y%
DEF SEG

END

This works because in mode 13, each pixel is one byte long and the whole screen is stored as a big contiguous array of 64,000 bytes starting at address A000:0000... or so it seems. In theory, this mode only requires 64K of video memory, although most VGA cards required to support mode 13 have up to 256K available. So the question is: how do we access the "missing" 192K?

Initializing mode Y

Mode Y is a tweaked mode 13, so let's start by initializing mode 13 and reprogram some registers. VGA programming involves a lot of talking with ports. Each element of the VGA has an Index and a Data port: the Index port is used to send the function index we want to access and the Data port is used to send or receive information to and from the currently selected function. We're not going to use all of them, but here's the full list anyway:

' Sequence Controller (SC)
CONST scINDEX = &H3C4
CONST scDATA = &H3C5
CONST scReset = 0
CONST scClock = 1
CONST scMapMask = 2
CONST scCharmap = 3
CONST scSequencerMemoryMode = 4

' Graphic Controller (GC)
CONST gcINDEX = &H3CE
CONST gcDATA = &H3CF
CONST gcSetReset = 0
CONST gcEnableSetReset = 1
CONST gcColorCompare = 2
CONST gcDataRotate = 3
CONST gcReadMapSelect = 4
CONST gcGraphicsMode = 5
CONST gcMiscellaneous = 6
CONST gcColorDontCare = 7
CONST gcBitMask = 8

' Cathode Ray Tube Controller (CRTC)
CONST crtcINDEX2 = &H3D4
CONST crtcDATA2 = &H3D5
CONST crtcHorizontalTotal = 0
CONST crtcEndHorizontalDisplay = 1
CONST crtcStartHorizontalBlank = 2
CONST crtcEndHorizontalBlank = 3
CONST crtcStartHorizontalRetrace = 4
CONST crtcEndHorizontalRetrace = 5
CONST crtcVerticalTotal = 6
CONST crtcOverflow = 7
CONST crtcPresetRowScan = 8
CONST crtcMaximumScanLine = 9
CONST crtcCursorStart = 10
CONST crtcCursorEnd = 11
CONST crtcStartAddressHigh = 12
CONST crtcStartAddressLow = 13
CONST crtcCursorLocationHigh = 14
CONST crtcCursorLocationLow = 15
CONST crtcVerticalRetraceStart = 16
CONST crtcVerticalRetraceEnd = 17
CONST crtcVerticalDisplayEnd = 18
CONST crtcOffset = 19
CONST crtcUnderlineLocation = 20
CONST crtcStartVerticalBlank = 21
CONST crtcEndVerticalBlank = 22
CONST crtcModeControl = 23

And now, for initializing mode Y:

'
' Enter mode Y
'
SUB ySCREEN()
  DIM aByte AS INTEGER

  ' Set standard 320x200x256 (linear)
  SCREEN 13                                  ' or Interrupt &H10, AX=&H13

  ' Disable chain-4 mode
  OUT scINDEX, scSequencerMemoryMode         ' sequencer memory mode
  aByte = INP(scDATA)                        ' get current value
  aByte = (aByte XOR (aByte AND 8)) OR 4     ' bit 3 (Chain-4) off, bit 2 (O/E Dis.) on
  OUT scDATA, aByte                          ' aByte should be &H06

  ' Turn off odd/even and set write mode 0
  OUT gcINDEX, gcGraphicsMode                ' graphic mode
  aByte = INP(gcDATA)                        ' get current value
  aByte = (aByte XOR aByte AND (1 + 2 + 16)) ' turn bits 0-1 (write) and 4 (Host O/E) off
  OUT gcDATA, aByte                          ' aByte should be &H40

  ' Disable chain
  OUT gcINDEX, gcMiscellaneous               ' graphic miscellaneous
  aByte = INP(gcDATA)                        ' get current value
  aByte = aByte XOR (aByte AND 2)            ' bit 1 (Chain O/E) off
  OUT gcDATA, aByte                          ' aByte should be &H05

  ' Disable long mode
  OUT crtcINDEX2, crtcUnderlineLocation      ' underline location
  aByte = INP(crtcDATA2)                     ' get current value
  aByte = aByte XOR (aByte AND 64)           ' turn bit 6 (double-word addressing) off
  OUT crtcDATA2, aByte                       ' aByte should be &H00

  ' Enable byte mode
  OUT crtcINDEX2, crtcModeControl            ' mode control
  aByte = INP(crtcDATA2)                     ' get current value
  aByte = aByte OR 64                        ' turn bit 6 (word/byte mode select) on
  OUT crtcDATA2, aByte                       ' aByte should be &HE3
END SUB

Easy, isn't it? Yes it is. And now, on to the bad news: first, mode Y is undocumented and none of QBasic's graphic instructions will work (no CLS, PSET, PRINT, DRAW, LINE, CIRCLE, POINT, GET, PUT or PAINT.) Second, pixels are no longer organized in that easy-to-use mode 13 fashion and are instead distributed across 4 planes.

Planes Pain

All 64,000 pixels of each 320x200 video page of mode Y is divided in 4 different planes of 16K each. Pixels are distributed according to their X coordinate (the column they are located in). Plane 0 would access columns 0, 4, 8, 12, 16, etc. Plane 1 would access columns 1, 5, 9, 13, 17, etc. Plane 2 would access columns 2, 6, 10, 14, 18, etc. And plane 3 would access columns 3, 7, 11, 15, 19, etc. To identify the plane a specific pixel belongs to, all we have to do is:

plane = pixelX MOD 4 ' easy to understand modulo, or...
plane = pixelX AND 3 ' weird bit masking thingy

As a direct consequence of mode Y's planar organization, pixels share the exact same memory address four by four: pixels located at (0, 0), (1, 0), (2, 0) and (3, 0) are all located at A000:0000, pixels at (4, 0), (5, 0), (6, 0), (7, 0) are all located at A000:0001, etc. In order to modify a specific pixel, you have to provide the VGA chipset the proper plane to modify and then write to memory as usual. Let's write replacements for PSET, CLS and POINT.

For PSET, we rely on the SC Map Mask Register to POKE information to the right plane. The Map Mask Register allows us to select one, two, three, or all four planes at once by masking them via their bitflag values: 1 for plane 0, 2 for plane 1, 4 for plane 2, and 8 for plane 3 (to select all planes, just do 1 + 2 + 4 + 8 = 15). After selecting the plane we want to work with, we reach to the page memory Segment via DEF SEG and POKE the pixel value. The X coordinate is divided by 4 because four pixels are stored on the exact same address. The same way, while 320 pixels are displayed, only 80 addresses exist for that line. Thus, we move one line down by multiplying the Y coordinate by 80.

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Set pixel on active page
'
SUB yPSET (x AS INTEGER, y AS INTEGER, c AS INTEGER)
  OUT scINDEX, scMapMask     ' select plane masking function
  OUT scDATA, 2 ^ (x AND 3)  ' mask only one plane
  DEF SEG = yPageSeg         ' select video page
  POKE (x \ 4 + y * 80), c   ' plot pixel
  DEF SEG
END SUB

Since planes are accessed via bit mask, if we select all 4 planes at once, we only have to write to 16,000 addresses to color all 64,000 pixels of the page! The new CLS routine is basically the same code as PSET, if only a lot simpler:

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Clear active page
'
SUB yCLS (c AS INTEGER)
  OUT scINDEX, scMapMask     ' select plane masking function
  OUT scDATA, &HF            ' select all four planes
  DEF SEG = yPageSeg         ' select video page
  FOR i% = 0 TO 15999        ' write to 16,000 addresses
    POKE i%, c
  NEXT i%
  DEF SEG
END SUB

The POINT replacement routine is different from PSET because it relies on GC register Read Map Select rather than SC register Map Mask. Also, while it is possible to write to multiple planes at once using a bit mask, it is not possible to read from more than one plane at a time and thus, this function takes the plane index (which is still the X coordinate modulo 4:)

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Read pixel on active page
'
FUNCTION yPOINT% (x AS INTEGER, y AS INTEGER)
  OUT gcINDEX, gcReadMapSelect ' select plane reading function
  OUT gcDATA, x AND 3          ' select plane by index
  DEF SEG = yPageSeg           ' select video page
  yPOINT% = PEEK(x \ 4 + y * 80)
  DEF SEG
END FUNCTION

Page Operations

In the previous routines, we used a variable called "yPageSeg"; this variable is intended to store the memory segment of the currently active page. This is where things get really interesting: in mode Y, each page is located 1,000 segments, or 16,000 offsets away (the size of one plane) from one another, starting at segment &HA000. At this point, you probably figured out that pages are "stacked" on top of one another, which means that increasing the memory offset by 80 will "push" the currently displayed page one line higher; essentially, you have a huge 320x800 pixels buffer, of which only 320x200 pixels are displayed.

The first routine we need will change the active page (the one we're currently working on.) It's a very simple routine that provides the memory address of the top left pixel of the video page by multiplying the page index by 1,000 and adding the video memory address (&HA000.) Remember: a whole video page has 64,000 pixels divided in 4 planes; that makes 16,000 unique addresses. A segment is 16 offsets, so each page is 1,000 "segments" apart:

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Change active page
'
SUB yPAGEWORK (pageNum AS INTEGER)
  yPageSeg = &HA000 + pageNum * 1000
END SUB

The second routine, yPAGEFLIP, changes the display page (the one that the user sees.) This allows double-buffering, a technique where you display a finished page until the next frame is done being rendered, and then switch the two pages. To achieve this, we simply tell the VGA chip to change the display memory offset (where the upper left pixel is located in memory) by using CRTC registers Start Address High and Low:

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Flip to another page
'
SUB yPAGEFLIP (pageNum AS INTEGER)
  DIM adr AS STRING

  adr = LEFT$(MKL$(CLNG(pageNum) * 16000), 2)

  DO: LOOP WHILE (INP(&H3DA) AND 1)

  OUT crtcINDEX2, crtcStartAddressHigh
  OUT crtcDATA2, ASC(RIGHT$(adr, 1))
  OUT crtcINDEX2, crtcStartAddressLow
  OUT crtcDATA2, ASC(LEFT$(adr, 1))

  DO: LOOP UNTIL (INP(&H3DA) AND 8)
END SUB

Latches

We've seen that there's one register to read pixels (GC Read Map Select) and another to write pixels (SC Map Mask.) This is great because it means we can read and write without swapping register functions. We've also seen that we can apply one value to four different pixels at once. If only there was a way to read more than one pixel and write different values at once, we could copy pixels from one page to the other, four pixels at a time! Well, we can, but this requires some trickery.

The VGA chipset doesn't just copy and paste information you feed it. Before writing stuff to video memory, the VGA will combine the information with built-in latches. These latches are filled by the content of all four planes located at the last memory address read. For instance, when you PEEK at A000:0001, the latches are automatically filled with pixels located at (4, 0), (5, 0), (6, 0) and (7, 0). What we need to do is program the VGA chipset so we can make a bogus write that will force latches to be copied (unchanged) to another video memory address:

'
' Copy one page to another
'
SUB yPAGECOPY (dstPage AS INTEGER, srcPage AS INTEGER)
  DIM ofs AS LONG

  OUT scINDEX, scMapMask     ' select plane masking function
  OUT scDATA, &HF            ' select all four planes

  OUT gcINDEX, gcBitMask     ' select CPU bit masking function
  OUT gcDATA, &H0            ' unmask all bits

  IF (dstPage < srcPage) THEN
    ofs = CLNG(srcPage - dstPage) * 16000
    DEF SEG = &HA000 + dstPage * 1000
    FOR i% = 0 TO 15999
      POKE i%, PEEK(ofs + i%)
    NEXT i%
  ELSE
    ofs = CLNG(dstPage - srcPage) * 16000
    DEF SEG = &HA000 + srcPage * 1000
    FOR i% = 0 TO 15999
      POKE (ofs + i%), PEEK(i%)
    NEXT i%
  END IF

  OUT gcDATA, &HFF           ' mask all bits

  DEF SEG
END SUB

This code selects all planes (like we did with yCLS), and then sets Graphic register Bit Mask to 0, which will "cancel" the information that we manually send to the VGA so it has to rely on data previously stored in latches. With this configuration, it doesn't matter what we PEEK or POKE, all that matters is to read and write at the proper memory addresses. When the operation is done, we put back the Bit Mask to "all" by setting its value to &HFF. This method could be used to copy portions of one page to the other using latches as long as the width of the section and both the source and target X coordinate is plane-aligned (multiple of 4.)

Print replacement

So far, we still have no replacement for PRINT, so let's get to it. The X and Y values assume a standard 8x8 character matrix (so 40 columns and 25 rows.) We'll be using CALL INTERRUPTX to obtain the memory address where BIOS characters are stored (in theory the address shouldn't change but better safe than sorry.) It's a good example of SC Map Mask function being used to draw multiple pixels at once. The code could easily be optimized (unrolling loops, precaching BIOS characters and reversing their bits once) for programs that need the extra speed:

'$INCLUDE: 'QB.BI'

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Print with BIOS character map
'
SUB yPRINT (x AS INTEGER, y AS INTEGER, c AS INTEGER, msg AS STRING)
  DIM chrSeg AS INTEGER, chrOfs AS INTEGER
  DIM ofs AS INTEGER, ofs2 AS INTEGER, size AS INTEGER
  DIM regs AS RegTypeX
  DIM scn(7) AS INTEGER, iRev(0 TO 15) AS INTEGER

  ' Lookup table for 4-bit reversal (ie. 0100 -> 0010)
  iRev(0) = &H0: iRev(1) = &H8: iRev(2) = &H4: iRev(3) = &HC
  iRev(4) = &H2: iRev(5) = &HA: iRev(6) = &H6: iRev(7) = &HE
  iRev(8) = &H1: iRev(9) = &H9: iRev(10) = &H5: iRev(11) = &HD
  iRev(12) = &H3: iRev(13) = &HB: iRev(14) = &H7: iRev(15) = &HF

  ' Avoid string overflow
  size = LEN(msg)
  IF ((size + x) > 40) THEN size = 40 - x

  ' Get memory location of BIOS font (could be a SHARED variable)
  regs.ax = &H1130                   ' Get font information
  regs.bx = &H300                    ' ROM 8x8 double dot font pointer
  CALL INTERRUPTX(&H10, regs, regs)  ' Do interrupt
  chrSeg = regs.es                   ' Segment to character map
  chrOfs = regs.bp                   ' Offset to character map

  OUT scINDEX, scMAPMASK             ' Select map mask register
  ofs = x * 2 + y * 640              ' Upper-left corner

  ' Parse string
  FOR i% = 1 TO size
    ' Catch character in BIOS RAM
    DEF SEG = chrSeg
    ofs2 = chrOfs + 8 * ASC(MID$(msg, i%, 1))

    ' Read characters from memory
    FOR j% = 0 TO 7
      scn(j%) = PEEK(ofs2 + j%)
    NEXT j%

    ' Draw character loop
    DEF SEG = yPageSeg
    FOR j% = 0 TO 7
      OUT scDATA, iRev(scn(j%) \ 16)   ' 4 leftmost pixels
      POKE ofs + 80 * j%, c
      OUT scDATA, iRev(scn(j%) AND 15) ' 4 rightmost pixels
      POKE ofs + 80 * j% + 1, c
    NEXT j%

    ofs = ofs + 2 ' Move 8 pixels to the right
  NEXT i%

  DEF SEG ' Select default segment
END SUB

Line replacement

And now, for a totally unnecessary and obnoxiously (is that even a real word?) long code, here's a replacement routine for drawing line segments, rectangles and solid rectangles. This function is not fool-proof, so be careful with it. The line drawing part is done with Bresenham's line algorithm, which uses INTEGER and is faster than floating-point calculations.

CONST isFill = 0 ' Draw a solid rectangle
CONST isRect = 1 ' Draw a rectangle
CONST isLine = 2 ' Draw a line

DIM SHARED yPageSeg AS INTEGER
yPageSeg = &HA000

'
' Draw a line, a rectangle, or a solid rectangle
'
SUB ySHAPE (x1 AS INTEGER, y1 AS INTEGER, x2 AS INTEGER, y2 AS INTEGER, _
c AS INTEGER, mode AS INTEGER)
  DIM xA AS INTEGER, yA AS INTEGER, xB AS INTEGER, yB AS INTEGER
  DIM xDelta AS INTEGER, yDelta AS INTEGER
  DIM yInc1 AS INTEGER, yInc2 AS INTEGER
  DIM det AS INTEGER, dInc1 AS INTEGER, dInc2 AS INTEGER
  DIM numPixels AS INTEGER, ofs AS INTEGER, msk AS INTEGER
  DIM horiz AS INTEGER, ofsY AS INTEGER, ofs2 AS INTEGER

  DEF SEG = yPageSeg
  OUT scINDEX, scMAPMASK

  '' DRAW STRAIGHT VERTICAL LINE / DRAW POINT
  IF (x1 = x2) THEN
    IF (y1 > y2) THEN
      ofs = y2 * 80 + x1 \ 4
    ELSE
      ofs = y1 * 80 + x1 \ 4
    END IF
    OUT scDATA, 2 ^ (x1 AND 3)
    FOR i% = 0 TO ABS(y2 - y1)
      POKE ofs, c
      ofs = ofs + 80
    NEXT i%

  '' DRAW STRAIGHT HORIZONTAL LINE
  ELSEIF (y1 = y2) THEN
    IF (x1 > x2) THEN xA = x2 ELSE xA = x1
    ofs = y1 * 80 + xA \ 4
    numPixels = ABS(x2 - x1) + 1

    ' Align starting point on plane (dirty)
    IF (xA AND 3) THEN
      msk = (xA AND 3)
      IF (numPixels > 4 - (xA AND 3)) THEN
        msk = &HF XOR (2 ^ (xA AND 3) - 1)
      ELSE
        msk = (2 ^ numPixels - 1) * 2 ^ (xA AND 3)
      END IF
      OUT scDATA, msk
      POKE ofs, c
      numPixels = numPixels - (4 - (xA AND 3))
      ofs = ofs + 1
    END IF

    ' Draw remaining pixels (starting point is now aligned)
    IF (numPixels > 3) THEN          ' If there's at least 4 pixels,
      OUT scDATA, &HF                ' Select all four planes
      DO
        POKE ofs, c                  ' Draw four pixels
        ofs = ofs + 1                ' Move to next stride
        numPixels = numPixels - 4    ' Remove 4 pixels from count
      LOOP WHILE (numPixels > 3)     ' Loop until less than 4 remain
    END IF
    IF (numPixels) THEN              ' There's less than 4 pixels now.
      OUT scDATA, 2 ^ numPixels - 1  ' Select all that remains.
      POKE ofs, c                    ' Draw.
    END IF

  '' DRAW LINE
  ELSEIF (mode = isLine) THEN
    ' Always draw left to right (easier to handle plane selection)
    IF (x1 > x2) THEN
      xB = x1: xA = x2
      yB = y1: yA = y2
    ELSE
      xA = x1: xB = x2
      yA = y1: yB = y2
    END IF

    xDelta = ABS(xB - xA)
    yDelta = ABS(yB - yA)

    IF (xDelta > yDelta) THEN
      numPixels = xDelta
      dInc1 = yDelta * 2
      dInc2 = (yDelta - xDelta) * 2
      det = dInc1 - xDelta
      horiz = -1
    ELSE
      numPixels = yDelta
      dInc1 = xDelta * 2
      dInc2 = (xDelta - yDelta) * 2
      yInc1 = 80
      det = dInc1 - yDelta
    END IF

    IF (yA > yB) THEN
      yInc1 = -yInc1
      yInc2 = -80
    ELSE
      yInc2 = 80
    END IF

    ' initial mask, offset and Y address
    yA = yA * 80
    ofs = xA \ 4
    msk = 2 ^ (xA AND 3)

    ' mostly horizontal (always adjust plane)
    IF horiz THEN
      FOR i% = 0 TO numPixels
        OUT scDATA, msk
        POKE ofs + yA, c

        IF (det < 0) THEN
          det = det + dInc1
          yA = yA + yInc1
          xA = xA + 1
        ELSE
          det = det + dInc2
          xA = xA + 1
          yA = yA + yInc2
        END IF

        IF (msk AND 8) THEN
          msk = 1
          ofs = ofs + 1
        ELSE
          msk = msk + msk
        END IF
      NEXT i%

    ' mostly vertical (adjust plane only on "split" pixels)
    ELSE
      OUT scDATA, msk
      FOR i% = 0 TO numPixels
        POKE ofs + yA, c

        IF (det < 0) THEN
          det = det + dInc1
          yA = yA + yInc1
        ELSE
          det = det + dInc2
          xA = xA + 1
          yA = yA + yInc2

          IF (msk AND 8) THEN
            msk = 1
            ofs = ofs + 1
          ELSE
            msk = msk + msk
          END IF
          OUT scDATA, msk
        END IF
      NEXT i%
    END IF

  '' DRAW RECTANGLE / DRAW SOLID RECTANGLE
  ELSE
    ' Always make X1 left and X2 right
    SELECT CASE x1
      CASE IS > x2
        xA = x2: xB = x1
      CASE ELSE
        xA = x1: xB = x2
    END SELECT

    ' Always make Y1 top and Y2 bottom
    SELECT CASE y1
      CASE IS > y2
        yA = y2: yB = y1
      CASE ELSE
        yA = y1: yB = y2
    END SELECT

    xDelta = ABS(xB - xA) + 1
    yDelta = ABS(yB - yA)
    ofsY = yA * 80

    '' DRAW RECTANGLE
    IF (mode = isRect) THEN
      ' left and right lines
      ofs = ofsY + xA \ 4
      OUT scDATA, 2 ^ (xA AND 3)
      FOR j% = 0 TO 1
        FOR i% = 0 TO yDelta
          POKE ofs, c
          ofs = ofs + 80
        NEXT i%
        ofs = ofsY + xB \ 4
        OUT scDATA, 2 ^ (xB AND 3)
      NEXT j%

      ' top and bottom lines
      ofs = ofsY + xA \ 4
      ofs2 = yB * 80 + xA \ 4

      ' Align starting point on plane (dirty)
      IF (xA AND 3) THEN
        msk = &HF XOR (2 ^ (xA AND 3) - 1)' draw all except offset
        IF (((4 - (xA AND 3)) - xDelta) = 1) THEN msk = msk XOR 8
        OUT scDATA, msk
        POKE ofs, c                  ' top
        POKE ofs2, c                 ' bottom
        xDelta = xDelta - (4 - (xA AND 3))
        ofs = ofs + 1
        ofs2 = ofs2 + 1
      END IF

      ' Draw remaining pixels (starting point is now aligned)
      IF (xDelta > 3) THEN             ' If there's at least 4 pixels,
        OUT scDATA, &HF                ' Select all four planes
        DO
          POKE ofs, c            ' Draw four pixels (top)
          POKE ofs2, c           ' Draw four pixels (bottom)
          ofs = ofs + 1          ' Move to next stride
          ofs2 = ofs2 + 1        ' Move to next stride
          xDelta = xDelta - 4    ' Remove 4 pixels from count
        LOOP WHILE (xDelta > 3)  ' Loop until less than 4 remain
      END IF
      IF (xDelta) THEN                ' There's less than 4 pixels now
        OUT scDATA, (2 ^ xDelta) - 1  ' Select all that remains
        POKE ofs, c                   ' Draw (top)
        POKE ofs2, c                  ' Draw (bottom)
      END IF

    '' DRAW SOLID RECTANGLE
    ELSE
      ofs = ofsY + xA \ 4

      ' Left edge
      IF (xA AND 3) THEN
        msk = &HF XOR (2 ^ (xA AND 3) - 1)' draw all except offset
        IF (((4 - (xA AND 3)) - xDelta) = 1) THEN msk = msk XOR 8
        OUT scDATA, msk

        FOR i% = 0 TO yDelta
          POKE ofs + i% * 80, c
        NEXT i%

        xDelta = xDelta - (4 - (xA AND 3))
        ofs = ofs + 1
      END IF

      ' Center
      IF (xDelta > 3) THEN           ' If there's at least 4 pixels,
        OUT scDATA, &HF              ' Select all four planes
        FOR i% = 0 TO yDelta
          ofs2 = ofs + i% * 80
          FOR j% = 0 TO xDelta \ 4 - 1
            POKE ofs2 + j%, c
          NEXT j%
        NEXT i%
        xDelta = xDelta AND &H3      ' Keep remaining pixels
        ofs = ofs + xDelta \ 4 + 1
      END IF

      ' Right edge
      IF (xDelta) THEN               ' There are fewer than 4 pixels
        OUT scDATA, 2 ^ xDelta - 1   ' Select all that remains
        POKE ofs, c                  ' Draw (top)
      END IF
    END IF
  END IF

  DEF SEG
END SUB

Some links you might find useful