Color Image Quantization

Introduction

The color cube and you

In the eternal words of William Shatner's rendition of Pulp's hit song Common People: "I don't know why but I had to start it somewhere, so it started there."

Colors can be represented differently according to the model used. Painters use the Cyan, Magenta, Yellow and Black (CMYK) subtractive color model in which mixing Yellow and Cyan produces Green, and mixing all the colors together produces something close to Black. It's also the model used in print jobs (in which pictures are represented by tiny dots of various density to give the illusion of more colors, sort of like dithering.) Note that the CMYK representation has nothing to do with the Village People. Computers use the Red Green and Blue additive color model (RGB for short) in which all colors mixed together produces White. That's also the model we'll be using.

Each channel ("component" or "primary color") is described by intensity (its contribution to the color.) It ranges from mute (0) to bright (255 for 8-bit per channel, or 63 for 6-bit per channel like the VGA.) The total amount of combinations possible depends on the number of bits per channel. For instance, 15-bit per pixel (16 bits with a padding bit) has 5 bits per channel, allowing up to 2^5 * 2^5 * 2^5 = 32,768 unique colors. The standard 24-bit per color uses 8 bits per channel. Therefore, it can produce up to 2^8 * 2^8 * 2^8 = 16,777,216 unique combinations. When a picture has 32-bit per pixel, it still uses 8 bits per channel but introduces an extra 8-bit value attributed to the alpha channel, which describes the level of opacity of the pixel (0 is see-through, 255 is opaque.)

Rather than think of colors as the sum of their channels (figuratively,) it's more visual to think of them as points in space. All we have to do is think of the RGB values as XYZ coordinates and place them in a 3D cube. The cube uses Red, Green and Blue as its axis, displays Black in its origin point (where Red Green and Blue are all equal 0,) and White in the opposite corner (where Red Green and Blue are 255.)

By considering colors as 3D coordinates, we can easily determine how "similar" or "close" two colors are by calculating the Euclidean distance between them via the Pythagorean Theorem:

dist# = SQR((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)

Fun fact of the day: we generally don't actually need to know the actual distance between the two colors, so we don't need the square root. After testing multiple entries, we only want to keep the one with the smallest value. It means that we can drop the SQR function entirely:

dist& = (r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2

It's a little bit faster because we traded a DOUBLE precision floating-point variable for a LONG. But there's a reasonably inaccurate method that is even faster: the taxicab approximation:

dist% = ABS(r1 - r2) + ABS(g1 - g2) + ABS(b1 - b2)

This is just one example of what we can do with this cube. In fact, it will become very useful when we get to the Median Cut Algorithm. What is the Median Cut Algorithm? It's one of many techniques we can use to identify representative colors in a picture. Why would we need to do that? Well, even though our beloved graphic mode 13 can produce up to 262,144 unique colors, only 256 of them can be used at once. In that sense, mode 13 is an indexed-color scheme, a paint-by-the-number system in which each pixel contains the index of a color in a global palette (in opposition to direct-color modes in which each pixel can have its own unique color.)

This means we can't display a 24-bit per pixel image with the standard 256-color VGA card. That is, until we dive into color image quantization. Color image quantization is a process through which a certain amount of unique colors are selected to represent an image that initially contained far more colors, and then redraw the image using this limited palette. In other words, it's a way for an 8-bit indexed color display (like mode 13) to draw a reasonably reproduced 24-bit per pixel bitmap...

You know what this means.

TGA image loader

We're going to use TGA files because they are very simple to process; since you'll have to provide your own uncompressed 24-bit per pixel TGA image, we're going to make a few checks to validate its content first. The routine will open the file, and then return a whole scanline after each call until the whole image has been parsed. It will then close the file automatically. More information about the TGA file format can be found here.

DECLARE FUNCTION getScanLine$ (filename AS STRING)

TYPE tgaHead
  idLength AS STRING * 1
  cMapPresent AS STRING * 1
  imgType AS STRING * 1
  cMapStart AS INTEGER
  cMapLength AS INTEGER
  cMapDepth AS STRING * 1
  imgOrgX AS INTEGER
  imgOrgY AS INTEGER
  imgWidth AS INTEGER
  imgHeight AS INTEGER
  imgDepth AS STRING * 1
  imgDescriptor AS STRING * 1
END TYPE

''
'' open [filename] and return each scanline one after the other
''
FUNCTION getScanLine$ (filename AS STRING) STATIC
  DIM ff AS INTEGER, scanLine AS STRING, h AS tgaHead

  ' file not yet open, do it now
  IF (ff = 0) THEN
    ff = FREEFILE
    OPEN filename FOR BINARY AS #ff

    ' validate content
    GET #ff, , h
    IF ((ASC(h.imgType) <> 2) OR (ASC(h.imgDepth) <> 24)) THEN
      PRINT filename; " is not uncompressed 24-bit per pixel"
      CLOSE #ff: ff = 0: EXIT FUNCTION
    END IF
    IF (h.imgWidth > 320) OR (h.imgHeight > 200) THEN
      PRINT filename; " is doesn't fit in the 320x200 display"
      CLOSE #ff: ff = 0: EXIT FUNCTION
    END IF
    IF (ASC(h.cMapPresent)) THEN
      PRINT filename; " has a palette"
      CLOSE #ff: ff = 0: EXIT FUNCTION
    END IF

    ' move to 1st scanline
    SEEK #ff, 1 + 18 + ASC(h.idLength)

    ' reserve buffer for a whole scanline
    scanLine = SPACE$(h.imgWidth * 3)
  END IF

  ' read scanline
  IF (h.imgHeight) THEN
    GET #ff, , scanLine
    h.imgHeight = h.imgHeight - 1
    getScanLine$ = scanLine
    EXIT FUNCTION
  END IF

  ' close file
  getScanLine$ = ""
  CLOSE #ff
  ff = 0
END FUNCTION

...and I think we're all set. Let's get to it.

Selecting Colors

Uniform Palette

This is a one-size-fits-all solution because it doesn't consider the actual color composition of the image. It is fast and easy to implement, but is equally dirty. It generates a 256-color palette using the same principle as the one we've seen above: each 24 bits color is converted into an 8-bit color in which the red and green channels are both allocated 3 bits each, while the blue channel is only granted 2. Blue is getting the shaft because our eyes are more sensitive to blue than red or green, so we can get away with a lesser resolution here.

To convert an 8-bit value to a 7-bit value, we make an integer division by 2. To crush an 8-bit value to a 6-bit value, we make an integer division by 4, etc. Those integer divisions have another effect: they place close intensities into the same group. For instance: intensity 0, 1, 2, and 3, when divided by 4, all become 0. Intensities 4, 5, 6, and 7 become 1. Intensities 8, 9, 10, and 11 become 2, etc. It also means that we crushed those 12 unique intensities (0 to 11,) to only 3 (0 to 2;) so when we go from (crushed) 0 to (crushed) 1, we actually skip 4 intensities of the original color space; in other words, colors are made more distant. We're going to use the same principle to crush over 16 million colors into 256. The final palette may look something like this:

       

Both images show the exact same palette (indices are ordered row-first, left to right from 0 to 255.) The image on the right has been re-organized on 8 columns to show how channels are mixed together: 8 shades of red are mixed with 8 shades of green, then blue is added every 64 entries, creating 4 shades... just like our color cube from earlier, black (0) and white (255) are diametrically opposite.

Following the colormap generation process, we can take any 24-bit value, crush it, and obtain right away the matching index in the colormap. This mean we won't have to search through the palette for a suitable replacement: we know where it is.

DIM r AS INTEGER, g AS INTEGER, b AS INTEGER, rgb AS INTEGER
DIM x AS INTEGER, y AS INTEGER, i AS INTEGER, scan AS STRING

' enter mode 13 (320x200, 256 colors)
SCREEN 13

' set up uniform palette
OUT &H3C6, &HFF
OUT &H3C8, 0
FOR i = 0 TO 255
  ' mask bits 2 to 0, do not shift to the right (output: 0-7)
  r = (i AND (2 ^ 2 + 2 ^ 1 + 2 ^ 0)) \ 2 ^ 0

  ' mask bits 5 to 3, shift three bits to the right (output: 0-7)
  g = (i AND (2 ^ 5 + 2 ^ 4 + 2 ^ 3)) \ 2 ^ 3

  ' mask bits 7 and 6, shift six bits to the right (output: 0-3)
  b = (i AND (2 ^ 7 + 2 ^ 6)) \ 2 ^ 6

  ' write to DAC (rescale channels to 0-63)
  OUT &H3C9, (r / 7) * 63
  OUT &H3C9, (g / 7) * 63
  OUT &H3C9, (b / 3) * 63
NEXT i

' read image, first scanline
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3
    ' take 8-bit channel, crush to 2 bits (output: 0-3)
    b = ASC(MID$(scan, x, 1)) \ 64

    ' take 8-bit channel, crush to 3 bits (output: 0-7)
    g = ASC(MID$(scan, x + 1, 1)) \ 32

    ' take 8-bit channel, crush to 3 bits (output: 0-7)
    r = ASC(MID$(scan, x + 2, 1)) \ 32

    ' get index in colormap (shift blue and green 6 and 3 bits to the left)
    rgb = (b * (2 ^ 6)) + (g * (2 ^ 3)) + r

    ' draw pixel
    PSET ((x - 1) \ 3, y), rgb
  NEXT x
  y = y + 1

  ' grab next scanline
  scan = getScanLine$("")
WEND

On the left is the original 24-bit per pixel image, next to it is the 8-bit per pixel indexed image using Uniform Palette. While the image is still somewhat recognizable, the result is very poor. The degradation can be somewhat mitigated with dithering... The important thing to remember here is that the colormap is instrumental to the final output, and determining a proper colormap is the real tricky part.

Note that out of the 256 colors available in the colormap, only 61 of them were used to draw the image on the right. This is somewhat revealing because even a direct-color image (24 or 16 bits per pixel) will rarely cover the entirety of the color space: it would need dark areas, equally bright areas, saturated and muted colors alike,... even then, it'd be very likely that multiple pixels would use the exact same color. Furthermore, just because 32,768 combinations (16 bits -- remember that the most significant bit is usually unused) are possible doesn't mean the image can physically use them all: a 100x20 image only has 2000 pixels, thus only 2000 unique colors can be used at most. This little factoid will become relevant again later.

Popularity Algorithm

Known as "Population Method" everywhere else in the world thanks to Wikipedia (and those who copy/paste Wikipedia articles,) the Popularity Algorithm was developped in 1978 by Tom Boyle, Andy Lippman, and Ephraim Cohen. Basically, the more often a color appears in the source image, the more likely it is to earn its place into the colormap... which is generally a winning gamble, like betting on red (or black depending on the circumstances. I don't have a cripling gambling addiction, I swear.)

First, we convert each 24-bit color to a 12-bit color by crushing each channel into 4 bits, essentially reducing the maximum number of colors in the image to 2^12, or 4,096. Then, we reserve an array of 4,096 entries where we can count the number of times each of these colors is used. When the whole image has been parsed, we sort the list by decreasing frequency and take the 256 most popular entries and write them to the colormap... we finish the job with the usual shtick: we go through the file again and draw the closest matching color in our colormap for every 24-bit pixel in the source image.

DECLARE FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER)

TYPE cMapType
  r AS INTEGER
  g AS INTEGER
  b AS INTEGER
END TYPE

TYPE cPxlType
  rgb AS INTEGER
  count AS INTEGER
END TYPE

DIM SHARED cMap(255) AS cMapType
DIM cList(4095) AS cPxlType
DIM r AS INTEGER, g AS INTEGER, b AS INTEGER, rgb AS INTEGER
DIM count AS INTEGER, offset AS INTEGER, max AS INTEGER, hiRef AS INTEGER
DIM x AS INTEGER, y AS INTEGER, i AS INTEGER, scan AS STRING

' read image, first scanline
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' convert to 4 bits per channel (4096 colors)
    rgb = ((b \ 16) * 256) + ((g \ 16) * 16) + (r \ 16)

    ' count
    cList(rgb).rgb = rgb
    cList(rgb).count = cList(rgb).count + 1
  NEXT x

  ' grab next scanline
  scan = getScanLine$("")
WEND

' sort colors by decreasing use
count = 4096
offset = 2048
DO WHILE offset
  max = count - offset - 1
  DO
    hiRef = 0
    FOR i = 0 TO max
      IF (cList(i).count < cList(i + offset).count) THEN
        SWAP cList(i), cList(i + offset)
        hiRef = i
      END IF
    NEXT i
    max = hiRef - offset
  LOOP WHILE hiRef
  offset = offset \ 2
LOOP

' write most often used colors to colormap array (8-bit per channel)
FOR i = 0 TO UBOUND(cMap)
  cMap(i).r = (cList(i).rgb AND &HF) * 16
  cMap(i).g = ((cList(i).rgb \ 16) AND &HF) * 16
  cMap(i).b = (cList(i).rgb \ 256) * 16
NEXT i

' enter mode 13 (320x200, 256 colors)
SCREEN 13

' set up palette (VGA card only allows 6 bits per channel by default)
OUT &H3C6, &HFF
OUT &H3C8, 0
FOR i = 0 TO UBOUND(cMap)
  OUT &H3C9, cMap(i).r \ 4
  OUT &H3C9, cMap(i).g \ 4
  OUT &H3C9, cMap(i).b \ 4
NEXT i

' read the image again, for display this time
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' draw pixel
    PSET ((x - 1) \ 3, y), nearest%(r, g, b)
  NEXT x
  y = y + 1

  ' grab next scanline
  scan = getScanLine$("")
WEND

''
'' return index of nearest color in cMap() using Taxicab Approximation
''
FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER) STATIC
  DIM dist AS INTEGER, temp AS INTEGER, i AS INTEGER

  dist = 20000
  FOR i = 0 TO UBOUND(cMap)
    temp = ABS(r - cMap(i).r) + ABS(g - cMap(i).g) + ABS(b - cMap(i).b)
    IF (temp < dist) THEN
      dist = temp
      nearest% = i
    END IF
  NEXT i
END FUNCTION

On the left is the original 24-bit per pixel image, next to it is the 8-bit per pixel indexed image using the colormap generated by the Popularity Algorithm (12 bits per color.) That's a definite improvement over the uniform palette, but there's also a not-so-obvious-at-first-glance issue with this method: popularity isn't a good substitute for representation.

While colors were noticeably degraded to fit on 12 bits, this degradation is unfortunately necessary to the process because it makes each of the possible 4,096 colors much more distant from one another (remember the color cube thing?) If we push the resolution a little and distribute our 24-bit colors to 32,768 slots rather than 4,096, then the colormap may very well reject less often used colors (that are otherwise very distant) in favor of similar-looking (closer) colors found in large areas. Below is a comparison of the Popularity Algorithm with 12 bits per color (left) and 15 bits per color (right.)

If the walls and carpet look great (which makes sense since half of the pixels in the image belong to either the walls or carpet,) the lampshade looks dull compared to the warm yellow glow on the left (which also happens to be closer to the original 24-bit image,) plants now look covered in charcoal, ant the cyan decorations outside are washed out. The strawberry cushion in the bottom left was made puffier than it should have in the left image, but the banding on the right image doesn't do it justice either.

Again: those areas, while "popping out" compared to the gray/beige walls, are also too small for the algorithm to retain. When we increased the resolution from 12 to 15 bits, we killed the chances of reds and greens to appear in the colormap because not only did we allow more shades of gray/beige to be taken in, we also prevented similar red tones to merge together and increase the global "red" count.

Overall this algorithm can be useful to quickly generate previews, but its flaws become obvious when requesting fewer than 256 colors.

Median Cut Algorithm

Conceived by Paul Heckbert in late 1979, this algorithm funnels similar-looking colors into increasingly smaller boxes. It's so good and fast that over 40 years after its creation, we're still using it (or variations of it.)

So, the process begins by taking every color in the source image and placing them into a box (like that color cube from earlier.) Then, we get the widest range of each channel (it's the equivalent of the longest side of the box if you will,) by collecting the minimal and maximal values (range) of each channel and subtracting one by the other:

rangeR = maxR - minR
rangeG = maxG - minG
rangeB = maxB - minB

IF (rangeR >= rangeG) AND (rangeR >= rangeB) THEN
  ' red has the widest range
ELSEIF (rangeG >= rangeR) AND (rangeG >= rangeB) THEN
  ' green has the widest range
ELSEIF (rangeB >= rangeR) AND (rangeB >= rangeG) THEN
  ' blue has the widest range
END IF

...then we sort the colors contained inside the box according to that channel. This won't physically move the colors inside the box, but rather insert them following a specific order (after all, the "cube" is only a visual representation, what we're actually working with is a simple 1-dimensional array that contains every color present in the picture... like a shopping list if you will.) When colors are sorted, we split the longest side of the box in two to obtain two boxes with an equal (or near equal) number of colors in each. That's the step-by-step explanation.

Conceptually the range tells us which two channels are the most stable ("closest" or "similar-looking") and which one stands out. When we divide the box in equal parts along this "dynamic" channel, we essentially reduce the amplitude: values that used to cover 0-160 may now cover 0-66 in one box and 67-160 in the other. This means that all colors inside either box are now more similar-looking than they were before the split. As the process is repeated over and over again, the distance between colors contained inside each box diminishes. Eventually, when we have enough boxes, we average the colors contained within and write them to the colormap.

Now here's the part where I admit I have no clue what I'm doing. It is often claimed that "the aim of the Median Cut Algorithm is to have each of the 256 output colors represent the same number of pixels in the input image," even though Heckbert's original paper states on page 38: "the split point is the median point - the plane which divides the box into two halves so that equal numbers of colors are on each side."

Let's look at it this way: if we have a scanline of 10 red pixels and 4 green pixels, that's 14 pixels but only 2 colors. If each box has to contain the same amount of pixels, one is filled with red (7 pixels,) the other with yellow (3 red pixels and 4 green pixels.) By comparison, if each box has to contain the same amount of colors, then we get a box of red and another of green. I mean, what would happen if a 128x128 image (16,384 pixels,) is divided into 256 boxes of 64 pixels each? What if 512 bright green pixels are present in the image? Wouldn't that produce 8 boxes filled to the brim with the same exact bright green color after averaging? That's why I believe the term pixel is either misleading or downright wrong.

The other question is where do we split? The paper talks about "two halves" and "equal number of colors" on each side, so I assume we have to count the number of unique colors in the box, and then split right in the middle, creating two boxes with the same amount of colors. However, I've seen code where the split is made to halve the widest range. For instance, if the widest range is 96 (between 32 and 128) on the blue channel, then the split should happen at 48 (or 96 for the center + minimal value,) creating a box whose blue range is 32-48 and another 49-128 (or 32-96 and 97-128 depending on the split location.) But it wouldn't necessarily result in boxes containing the same amount of colors (maybe the box contains twelve RR,GG,32 colors and only one RR,GG,128 color.) After some testing, it seems that the simple cut-in-half approach is sufficient (and faster) so I'll stick to that. Apologizes all around if it's not the proper way of doing things.

Since we're absolutely completely insane and won't program in anything but QuickBASIC, we won't be able to store the whole picture into memory. Even if the memory was available, QuickBASIC was bitten by a large array when he was a kid and developed an aversion for them ever since that day. The workaround here is to identify each unique 15-bit color and count how many times it appears in the image (for averaging purpose.) The collected data would be placed into an "item" array for later reference.

In a perfect world the item array would be large enough to contain all possible 15-bit colors (that is 32,768 entries.) But that's too much for QuickBASIC and it's unnecessary because an image rarely covers the whole color space (here's that factoid again.) So, for no particular reason at all, the item array will contain at most 8,000 entries (yes, I pulled this value out of my ass.) For extra safety, we could prevent overflowing by assigning new colors to already registered similar-looking colors when the array is maxed out. But I like to live dangerously.

DECLARE SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER)
DECLARE SUB addColor (rgb AS INTEGER)
DECLARE FUNCTION widestRange% (first AS INTEGER, last AS INTEGER)
DECLARE FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER)

TYPE cMapType
  r AS INTEGER
  g AS INTEGER
  b AS INTEGER
END TYPE

TYPE cPxlType
  rgb AS INTEGER
  count AS INTEGER
END TYPE

TYPE bucketType
  first AS INTEGER
  items AS INTEGER
  depth AS INTEGER
  order AS INTEGER
END TYPE

DIM SHARED cMap(255) AS cMapType
DIM SHARED cList(7999) AS cPxlType, cCount AS INTEGER
DIM bucket(256) AS bucketType, numBuckets AS INTEGER
DIM r AS INTEGER, g AS INTEGER, b AS INTEGER
DIM floatR AS SINGLE, floatG AS SINGLE, floatB AS SINGLE
DIM scan AS STRING, first AS INTEGER, last AS INTEGER, rgb AS INTEGER
DIM i AS INTEGER, j AS INTEGER, x AS INTEGER, y AS INTEGER
DIM temp AS INTEGER, median AS INTEGER, depth AS INTEGER

' read scanline
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' convert to 15-bit RGB
    rgb = ((b \ 8) * 1024) + ((g \ 8) * 32) + (r \ 8)

    ' add color to cList() and update .count
    addColor rgb
  NEXT x

  ' grab next scanline
  scan = getScanLine$("")
WEND

' initial bucket
bucket(0).first = 0
bucket(0).items = cCount
bucket(0).depth = 0
bucket(0).order = -1
numBuckets = 1

' generate 256 buckets
WHILE (numBuckets < 256)

  ' split buckets
  FOR i = 0 TO numBuckets - 1
    IF (bucket(i).depth = depth) THEN
      first = bucket(i).first
      last = bucket(i).first + bucket(i).items - 1

      ' sort bucket colors according to widest range if necessary
      temp = widestRange%(first, last)
      IF (bucket(i).order <> temp) THEN
        sortList first, last, temp
        bucket(i).order = temp
      END IF

      ' median split (equal number of colors in either buckets)
      median = bucket(i).items \ 2

      ' spill colors over new bucket
      bucket(numBuckets).first = first + median + 1
      bucket(numBuckets).items = bucket(i).items - (median + 1)
      bucket(numBuckets).depth = depth + 1
      bucket(numBuckets).order = bucket(i).order

      ' update current bucket
      bucket(i).items = median + 1
      bucket(i).depth = depth + 1

      ' add one bucket to the list
      numBuckets = numBuckets + 1
    END IF
  NEXT i

  depth = depth + 1
WEND

' write colormap
FOR j = 0 TO numBuckets - 1
  IF (bucket(j).items) THEN

    ' select bucket
    first = bucket(j).first
    last = bucket(j).first + bucket(j).items - 1

    ' average bucket colors
    floatR = 0: floatG = 0: floatB = 0
    count = 0
    FOR i = first TO last
      floatR = floatR + ((cList(i).rgb) AND &H1F) * cList(i).count
      floatG = floatG + ((cList(i).rgb \ 32) AND &H1F) * cList(i).count
      floatB = floatB + ((cList(i).rgb \ 1024) AND &H1F) * cList(i).count
      count = count + cList(i).count
    NEXT i

    ' write to colormap array
    cMap(j).r = (floatR / count) * 8
    cMap(j).g = (floatG / count) * 8
    cMap(j).b = (floatB / count) * 8

  ELSE
    cMap(j).r = 255
    cMap(j).g = 0
    cMap(j).b = 255
  END IF
NEXT j

' enter mode 13 (320x200, 256 colors)
SCREEN 13

' set up palette (VGA card only allows 6 bits per channel by default)
OUT &H3C6, &HFF
OUT &H3C8, 0
FOR i = 0 TO UBOUND(cMap)
  OUT &H3C9, cMap(i).r \ 4
  OUT &H3C9, cMap(i).g \ 4
  OUT &H3C9, cMap(i).b \ 4
NEXT i

' read the image again, for display this time
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' draw pixel
    PSET ((x - 1) \ 3, y), nearest%(r, g, b)
  NEXT x
  y = y + 1

  ' grab next scanline
  scan = getScanLine$("")
WEND

''
'' insert RGB color into cList()
''
SUB addColor (rgb AS INTEGER) STATIC
  DIM loBound AS INTEGER, hiBound AS INTEGER
  DIM middle AS INTEGER, i AS INTEGER

  ' search for color in list
  loBound = 0
  hiBound = cCount - 1
  DO WHILE (loBound <= hiBound)
    middle = (loBound + hiBound) \ 2
    SELECT CASE cList(middle).rgb
    CASE IS < rgb
      loBound = middle + 1
    CASE IS > rgb
      hiBound = middle - 1
    CASE ELSE
      cList(middle).count = cList(middle).count + 1
      EXIT SUB ' found, exit
    END SELECT
  LOOP

  ' not found, insert
  IF (cList(middle).rgb < rgb) THEN middle = loBound
  FOR i = cCount TO (middle + 1) STEP -1
    cList(i) = cList(i - 1)
  NEXT i
  cList(middle).rgb = rgb
  cList(middle).count = 1
  cCount = cCount + 1
END SUB

''
'' sort cList() entries between [first] and [last] according to channel:
'' 1 = red, 32 = green, 1024 = blue
''
SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER) STATIC
  DIM count AS INTEGER, offset AS INTEGER, max AS INTEGER
  DIM hiRef AS INTEGER, i AS INTEGER

  count = last - first + 1
  offset = count \ 2
  DO WHILE offset
    max = count - offset - 1
    DO
      hiRef = 0
      FOR i = first TO first + max
        IF (((cList(i).rgb \ channel) AND &H1F) > ((cList(i + offset).rgb \ channel) AND &H1F)) THEN
          SWAP cList(i), cList(i + offset)
          hiRef = i - first
        END IF
      NEXT i
      max = hiRef - offset
    LOOP WHILE hiRef
    offset = offset \ 2
  LOOP
END SUB

''
'' find the channel with the highest range between entries [first] and [last] in cList()
''
FUNCTION widestRange% (first AS INTEGER, last AS INTEGER) STATIC
  DIM r AS INTEGER, minR AS INTEGER, maxR AS INTEGER
  DIM g AS INTEGER, minG AS INTEGER, maxG AS INTEGER
  DIM b AS INTEGER, minB AS INTEGER, maxB AS INTEGER
  DIM i AS INTEGER, rgb AS INTEGER

  ' initialize each channel's lower and upper bounds
  minR = 255: maxR = 0
  minG = 255: maxG = 0
  minB = 255: maxB = 0

  ' update each channel's minimal and maximal values
  FOR i = first TO last
    rgb = cList(i).rgb

    r = (rgb AND &H1F)
    g = ((rgb \ 32) AND &H1F)
    b = (rgb \ 1024)

    IF (r > maxR) THEN maxR = r
    IF (r < minR) THEN minR = r
    IF (g > maxG) THEN maxG = g
    IF (g < minG) THEN minG = g
    IF (b > maxB) THEN maxB = b
    IF (b < minB) THEN minB = b
  NEXT i

  ' get range/ampliture of each channel
  r = maxR - minR
  g = maxG - minG
  b = maxB - minB

  ' return the channel with the widest range (1: red, 32: green, 1024: blue)
  IF (r >= g) AND (r >= b) THEN
    widestRange% = 1
  ELSEIF (g >= r) AND (g >= b) THEN
    widestRange% = 32
  ELSE
    widestRange% = 1024
  END IF
END FUNCTION

''
'' return index of nearest color in cMap() using Taxicab Approximation
''
FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER) STATIC
  DIM dist AS INTEGER, temp AS INTEGER, i AS INTEGER

  dist = 20000
  FOR i = 0 TO UBOUND(cMap)
    temp = ABS(r - cMap(i).r) + ABS(g - cMap(i).g) + ABS(b - cMap(i).b)
    IF (temp < dist) THEN
      dist = temp
      nearest% = i
    END IF
  NEXT i
END FUNCTION

And here we have the original 24-bit image on the left, against the 8-bit per pixel indexed image using the colormap generated by the Median Cut Algorithm (15 bits per color.) Some colors are slightly muted (especially the greens) due to the averaging. Compared to the Popularity Algorithm, gradients are smoother than they were in the 12-bit version (as seen on walls and flooring, but also on the strawberry cushion) and we still retain some details that were lost in the 15-bit version (such as the cyan areas, the lampshade glow and flowers.)

As an aside, the code above is causing some issues to the compiler shipped with QuickBASIC 4.5. To fix the compilation error, replace the following line in sortList():

IF (((cList(i).rgb \ channel) AND &H1F) > ((cList(i + offset).rgb \ channel) AND &H1F)) THEN

By this:

a% = cList(i + offset).rgb
IF (((cList(i).rgb \ channel) AND &H1F) > ((a% \ channel) AND &H1F)) THEN

Yes, it's a bug in the compiler. That's QuickBASIC for ya'.

Code Optimization

The Median Cut Algorithm is really fast, but our code is holding it back. There are especially three places that could use a little more work:

  1. The time required to collect color statistics during the initial image read can be decreased by half, by combining the binary search with a hash key. It's not great but much better. See addColor().

  2. The Median Cut Algorithm spends most of its time sorting colors, so we should replace the Shell Sort Algorithm by the Quick Sort Algorithm to also reduce the processing time by half. See sortList().

  3. The biggest gain is in the 24-bit RGB to colormap index conversion (the actual image quantization:) by reorganizing the colormap and making some wild assumptions, we can divide that time by ten! See nearest%().

    The wild assumption goes like this: if we can guess where a specific RGB color should be placed in the colormap, then we can assume that similar colors should be located nearby. In practice the distance between pitch-black and each color in the colormap is computed, then colors are ordered from darkest to brightest (therefore, by increasing distance.) It's more or less a dirty shortcut to sort colors by luminance without resorting to floating point operations or multiplications.

    When we're looking for a specific color, we compute its distance from black and look for entries in the colormap that are about that same distance, thus reducing the number of entries we have to check. The code can also be made faster by precaching the starting and ending color indices for each initial brightness level rather than using a binary search and testing for maximum brightness inside the evaluation loop. The method gets it right fairly often, but sometimes misses the spot and has to use a less representative color with a similar brightness. It's bound to happen since our technique ignores the actual RGB composition.

    There are faster and more accurate algorithms out there (such as the one described at the very end of Heckbert's paper,) but they are far more complex. This one does fine. Kind of.

DECLARE SUB writeCMap ()
DECLARE SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER)
DECLARE SUB addColor (rgb AS INTEGER)
DECLARE FUNCTION widestRange% (first AS INTEGER, last AS INTEGER)
DECLARE FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER)

TYPE cMapType
  r AS INTEGER
  g AS INTEGER
  b AS INTEGER
  dist AS INTEGER
END TYPE

TYPE cPxlType
  rgb AS INTEGER
  count AS INTEGER
END TYPE

TYPE bucketType
  first AS INTEGER
  items AS INTEGER
  order AS INTEGER
END TYPE

DIM SHARED cMap(255) AS cMapType
DIM SHARED cList(7999) AS cPxlType, cCount AS INTEGER
DIM SHARED bucket(256) AS bucketType, numBuckets AS INTEGER
DIM r AS INTEGER, g AS INTEGER, b AS INTEGER, rgb AS INTEGER
DIM scan AS STRING, first AS INTEGER, last AS INTEGER
DIM i AS INTEGER, j AS INTEGER, x AS INTEGER, y AS INTEGER
DIM temp AS INTEGER, median AS INTEGER

'' collect color information

' read scanline
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' convert to 15-bit RGB
    rgb = ((b \ 8) * 1024) + ((g \ 8) * 32) + (r \ 8)

    ' add color to cList() and update .count
    addColor rgb
  NEXT x

  ' grab next scanline
  scan = getScanLine$("")
WEND

'' generate 256 buckets

' first bucket contains all
bucket(0).first = 0
bucket(0).items = cCount
bucket(0).order = -1
numBuckets = 1

' other buckets are derived from the 1st
WHILE (numBuckets < 256)

  ' split buckets
  FOR i = 0 TO numBuckets - 1
    first = bucket(i).first
    last = bucket(i).first + bucket(i).items - 1

    ' sort bucket colors according to widest range if necessary
    temp = widestRange%(first, last)
    IF (bucket(i).order <> temp) THEN
      sortList first, last, temp
      bucket(i).order = temp
    END IF

    ' median split (equal number of colors in either buckets)
    median = bucket(i).items \ 2

    ' spill colors over new bucket
    bucket(numBuckets).first = first + median + 1
    bucket(numBuckets).items = bucket(i).items - (median + 1)
    bucket(numBuckets).order = bucket(i).order

    ' update current bucket
    bucket(i).items = median + 1

    ' add one bucket to the list
    numBuckets = numBuckets + 1
  NEXT i
WEND

'' set up mode 13 (320x200, 256 colors)

' generate cMap()
writeCMap

' enter mode 13
SCREEN 13

' set up palette (VGA card only allows 6 bits per channel by default)
OUT &H3C6, &HFF
OUT &H3C8, 0
FOR i = 0 TO UBOUND(cMap)
  OUT &H3C9, cMap(i).r \ 4
  OUT &H3C9, cMap(i).g \ 4
  OUT &H3C9, cMap(i).b \ 4
NEXT i

'' quantize color image (convert 24-bit RGB to palette indices)

scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' draw pixel
    PSET ((x - 1) \ 3, y), nearest%(r, g, b)
  NEXT x
  y = y + 1

  ' grab next scanline
  scan = getScanLine$("")
WEND

''
'' insert/count RGB colors in cList(). First add up R G and B, and use the
'' result (a number between 0 and 3x 31) as a hash. The hash points to an
'' element in gIndex(). Each element in gIndex is a string containing offsets
'' to cList() items, ordered by increasing RGB value. If RGB is already stored
'' it must be in one of those offsets. If it's not, the RGB value is added to
'' cList() and the index at which it was stored is placed into the proper
'' gIndex().
''
SUB addColor (rgb AS INTEGER) STATIC
  DIM gIndex(93) AS STRING, gHash AS INTEGER
  DIM loBound AS INTEGER, hiBound AS INTEGER
  DIM middle AS INTEGER, insert AS INTEGER
  DIM ofst AS INTEGER

  ' get RGB hash
  gHash = (rgb \ 1024) + ((rgb \ 32) AND &H1F) + (rgb AND &H1F)

  ' search gIndex(gHash) for indices in cList()
  insert = 0
  loBound = 0
  hiBound = LEN(gIndex(gHash)) \ 2 - 1
  DO WHILE (loBound <= hiBound)
    middle = (loBound + hiBound) \ 2
    ofst = CVI(MID$(gIndex(gHash), 1 + middle * 2, 2))
    SELECT CASE cList(ofst).rgb
    CASE IS < rgb
      loBound = middle + 1
      insert = loBound
    CASE IS > rgb
      hiBound = middle - 1
      insert = middle
    CASE ELSE
      cList(ofst).count = cList(ofst).count + 1
      EXIT SUB
    END SELECT
  LOOP

  ' save cList() offset to dictionary
  gIndex(gHash) = LEFT$(gIndex(gHash), insert * 2) + MKI$(cCount) + _
  RIGHT$(gIndex(gHash), LEN(gIndex(gHash)) - (insert * 2))

  ' save color to cList()
  cList(cCount).rgb = rgb
  cList(cCount).count = 1
  cCount = cCount + 1
END SUB

''
'' return index of nearest color in cMap(). This algorithm is based off
'' luminosity: cMap() has been sorted by increasing brightness and their
'' distance from black has been computed. This routine determines where the
'' provided RGB value would be placed in cMap() and starts searching there
'' rather than browse the whole array each time. It's fast but not very
'' accurate.
''
FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER)
  DIM loDist AS INTEGER, hiDist AS INTEGER, dist AS INTEGER, temp AS INTEGER
  DIM loBound AS INTEGER, hiBound AS INTEGER, middle AS INTEGER, i AS INTEGER
  DIM loPoint AS INTEGER

  ' set minimum brightness (toward entry 0)
  loDist = (r + g + b) - 8

  loBound = 0
  hiBound = 255
  DO WHILE (loBound <= hiBound)
    middle = (loBound + hiBound) \ 2
    SELECT CASE cMap(middle).dist
    CASE IS < loDist
      loBound = middle + 1
      loPoint = loBound
    CASE IS > loDist
      hiBound = middle - 1
      loPoint = middle
    CASE ELSE
      loPoint = middle
      EXIT DO
    END SELECT
  LOOP

  IF (loPoint > 255) THEN
    nearest% = 255
    EXIT FUNCTION
  END IF

  ' set maximum brightness (toward entry 255)
  hiDist = loDist + 16
  IF (hiDist > 765) THEN hiDist = 765

  dist = 20000
  FOR i = loPoint TO UBOUND(cMap)
    ' check actual distance between RGB value and cMap() value
    temp = ABS(r - cMap(i).r) + ABS(g - cMap(i).g) + ABS(b - cMap(i).b)
    IF (temp < dist) THEN
      dist = temp
      nearest% = i
    END IF

    ' test now so we at least have a color to return
    IF (cMap(i).dist > hiDist) THEN EXIT FOR
  NEXT i
END FUNCTION

''
'' the Median Cut algorithm spends most of its time sorting colors. Make this
'' routine as fast as possible. Here, we're using the QuickSort algorithm to
'' order entries between [first] and [last] in cList() according to the
'' desired channel (1: red, 32: green, 1024: blue.) QuickSort's only weakness
'' is already sorted arrays, so do not call this routine if cList() is already
'' in the proper order.
''
SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER) STATIC
  DIM lstack(255) AS INTEGER, stkRef AS INTEGER, midRef AS INTEGER
  DIM lo AS INTEGER, hi AS INTEGER, lo2 AS INTEGER, hi2 AS INTEGER

  lstack(0) = first
  lstack(1) = last
  stkRef = 2
  DO
    stkRef = stkRef - 2
    lo = lstack(stkRef)
    hi = lstack(stkRef + 1)
    DO
      lo2 = lo: hi2 = hi
      midRef = (cList((lo + hi) \ 2).rgb \ channel) AND &H1F
      DO
        DO WHILE (((cList(lo2).rgb \ channel) AND &H1F) < midRef)
          lo2 = lo2 + 1
        LOOP
        DO WHILE (((cList(hi2).rgb \ channel) AND &H1F) > midRef)
          hi2 = hi2 - 1
        LOOP
        IF (lo2 <= hi2) THEN
          SWAP cList(lo2), cList(hi2)
          lo2 = lo2 + 1
          hi2 = hi2 - 1
        END IF
      LOOP WHILE (lo2 <= hi2)

      IF (hi2 - lo) < (hi - lo2) THEN
        IF (lo2 < hi) THEN
          lstack(stkRef) = lo2
          lstack(stkRef + 1) = hi
          stkRef = stkRef + 2
        END IF
        hi = hi2
      ELSE
        IF (lo < hi2) THEN
          lstack(stkRef) = lo
          lstack(stkRef + 1) = hi2
          stkRef = stkRef + 2
        END IF
        lo = lo2
      END IF
    LOOP WHILE (lo < hi)
  LOOP WHILE stkRef
END SUB

''
'' find the channel with the highest range between entries [first] and [last]
'' in cList()
''
FUNCTION widestRange% (first AS INTEGER, last AS INTEGER) STATIC
  DIM r AS INTEGER, minR AS INTEGER, maxR AS INTEGER
  DIM g AS INTEGER, minG AS INTEGER, maxG AS INTEGER
  DIM b AS INTEGER, minB AS INTEGER, maxB AS INTEGER
  DIM i AS INTEGER, rgb AS INTEGER

  ' initialize each channel's lower and upper bounds
  minR = 255: maxR = 0
  minG = 255: maxG = 0
  minB = 255: maxB = 0

  ' update each channel's minimal and maximal values
  FOR i = first TO last
    rgb = cList(i).rgb

    r = (rgb AND &H1F)
    g = ((rgb \ 32) AND &H1F)
    b = (rgb \ 1024)

    IF (r > maxR) THEN maxR = r
    IF (r < minR) THEN minR = r
    IF (g > maxG) THEN maxG = g
    IF (g < minG) THEN minG = g
    IF (b > maxB) THEN maxB = b
    IF (b < minB) THEN minB = b
  NEXT i

  ' get range/ampliture of each channel
  r = maxR - minR
  g = maxG - minG
  b = maxB - minB

  ' return the channel with the widest range (1: red, 32: green, 1024: blue)
  IF (r >= g) AND (r >= b) THEN
    widestRange% = 1
  ELSEIF (g >= r) AND (g >= b) THEN
    widestRange% = 32
  ELSE
    widestRange% = 1024
  END IF
END FUNCTION

''
'' write the cMap() array by averaging the content of every bucket. This
'' routine also sort entries from darkest to brightest and stores an the
'' taxicab distance from pitch black.
''
SUB writeCMap
  DIM count AS INTEGER, offset AS INTEGER
  DIM max AS INTEGER, hiRef AS INTEGER
  DIM i AS INTEGER, j AS INTEGER
  DIM r AS LONG, g AS LONG, b AS LONG

  ' write colormap
  FOR j = 0 TO numBuckets - 1
    IF (bucket(j).items) THEN
      ' average bucket colors
      r = 0: g = 0: b = 0
      count = 0
      FOR i = bucket(j).first TO bucket(j).first + bucket(j).items - 1
        r = r + CLNG((cList(i).rgb) AND &H1F) * cList(i).count
        g = g + CLNG((cList(i).rgb \ 32) AND &H1F) * cList(i).count
        b = b + CLNG((cList(i).rgb \ 1024) AND &H1F) * cList(i).count
        count = count + cList(i).count
      NEXT i
      r = CINT((r / count) * 8)
      g = CINT((g / count) * 8)
      b = CINT((b / count) * 8)
    ELSE
      r = 255
      g = 0
      b = 255
    END IF

    ' write to colormap array, create compute taxicab distance
    cMap(j).r = r
    cMap(j).g = g
    cMap(j).b = b
    cMap(j).dist = cMap(j).r + cMap(j).g + cMap(j).b
  NEXT j

  ' sort colors by distance
  count = UBOUND(cMap) - LBOUND(cMap) + 1
  offset = count \ 2
  DO WHILE offset
    max = count - offset - 1
    DO
      hiRef = 0
      FOR i = 0 TO max
        IF (cMap(i).dist > cMap(i + offset).dist) THEN
          SWAP cMap(i), cMap(i + offset)
          hiRef = i
        END IF
      NEXT i
      max = hiRef - offset
    LOOP WHILE hiRef
    offset = offset \ 2
  LOOP
END SUB

On the left is the image that was generated previously using the Median Cut Algorithm and rendered with the slower nearest color function. The right image shows the new code in action. In DOSBox at 3000 cycles (it should be comparable to a 80386 at 30Mhz,) colors were collected in 7 seconds, the Median Cut Algorithm took 1 second and the colormap generation a little less than a second. The rendering itself was about 7 seconds. Not amazing, but much better than before.

The new "nearest color" function is much faster but also less precise. The final output shows some distortions that often prioritize gray over the proper color: part of the cyan panels outside are washed out, some green/brown pixels appeared on the strawberry cushion and more gray is visible on the lampshade to the right (partially seen on the edge of the picture, next to the plant behind the column.)

Dithering (especially Floyd-Steinberg's filter)

When we replaced 24-bit colors by "look-alikes" present in the colormap, we accepted that the R G and B channels were not going to be exactly the same. Dithering is an error diffusion technique that will carry the difference and spread it to nearby pixels that we still have to draw. It's pretty simple and the most popular filters (Floyd-Steinberg, Burkes, Sierra Lite, etc.) require very little extra memory. Let's do that.

Before anything else, we reserve an array of RGB triplets to contain the error diffusion. That array is at least the length (in pixels) of a scanline with some extra pixels (two for Burkes, one for Floyd-Steinberg, and none for Sierra Lite.) The array must be processed like a circular buffer (the reading/writing offset wraps from last to first pixel as necessary -- this is achieved with the modulo operator.) Using this buffer, we can access the next pixel in queue by incrementing the current X value. We can also access the pixel directly below by adding the width of the scanline (in pixels) to the X value.

After finding the nearest color available for a given RGB color, we subtract each channel from the substitute color by the source color to obtain the error:

errR = cMap(y).R - source.R
errG = cMap(y).G - source.G
errB = cMap(y).B - source.B

The error is then divided in equal parts: 1/32 for Burkes, 1/16 for Floyd-Steinberg, 1/4 for Sierra Lite. Those parts are then added to nearby cells (contained within our circular buffer from earlier.) The illustration below shows how the error should be diffused according to Floyd-Steinberg:

There's one last thing to clarify: before looking for the nearest color available, we're supposed to look into the circular buffer and add the error of the current pixel to the source color (that's how the error affects nearby pixels.) It is also imperative to cap the RGB values before looking for the pixel, because we may obtain values below 0 or above 255. When this is done, we have to reset the cell's value in the buffer, then move to the next cell in line (representing the pixel to the right of the one we just painted.) The process looks something like this:

' add error for current pixel to the source RGB
' we obtained [src] from the image data
src.r = src.r + err(pxl).r
src.g = src.g + err(pxl).g
src.b = src.b + err(pxl).b

' since the buffer is circular, we'll eventually re-use
' this index. Make sure it'll be ready by clearing it now.
err(pxl).r = 0
err(pxl).g = 0
err(pxl).b = 0

' keep values in the 0-255 range
IF (src.r < 0) THEN src.r = 0 ELSE IF (src.r > 255) THEN src.r = 255
IF (src.g < 0) THEN src.g = 0 ELSE IF (src.g > 255) THEN src.g = 255
IF (src.b < 0) THEN src.b = 0 ELSE IF (src.b > 255) THEN src.b = 255

' search for nearest color in the colormap (includes error)
id = nearest%(src.r, src.g, src.b)

' draw pixel on screen
PSET(x, y), id

' compute error introduced by the current pixel and
' divide it in equal parts (16 for Floyd-Steinberg filter)
errR = (cMap(id).r - src.r) / 16
errG = (cMap(id).g - src.g) / 16
errB = (cMap(id).b - src.b) / 16

' diffuse error to nearby cells: add 7 parts to
' the pixel on the right (9 parts left)
temp = (pxl + 1) MOD bufferSize
err(temp).r = err(temp).r + (errR * 7)
err(temp).g = err(temp).g + (errG * 7)
err(temp).b = err(temp).b + (errB * 7)

' add 3 parts to the pixel in the bottom left (6 parts left)
temp = (pxl + imgWidth - 1) MOD bufferSize
err(temp).r = err(temp).r + (errR * 3)
err(temp).g = err(temp).g + (errG * 3)
err(temp).b = err(temp).b + (errB * 3)

' add 5 parts to the pixel below (1 part left)
temp = (pxl + imgWidth) MOD bufferSize
err(temp).r = err(temp).r + (errR * 5)
err(temp).g = err(temp).g + (errG * 5)
err(temp).b = err(temp).b + (errB * 5)

' add last part to the pixel in the bottom right
temp = (pxl + imgWidth + 1) MOD bufferSize
err(temp).r = err(temp).r + errR
err(temp).g = err(temp).g + errG
err(temp).b = err(temp).b + errB

' the entire error has been shared among nearby pixels,
' those values will be processed when we get to them...

' move circular buffer writing/reading offset
pxl = (pxl + 1) MOD imgWidth

Most dithering algorithms follow the exact same principle. The difference lies in how the error is spread. Below are the patterns used by various popular filters:

If you got the concept down, I'd rather you give it a go on your own. If not, here's the whole code again using the Floyd-Steinberg filter. The listing is pretty long and it's far from optimized. Remember to include the getScanLine$() function.

DECLARE SUB writeCMap ()
DECLARE SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER)
DECLARE SUB addColor (rgb AS INTEGER)
DECLARE FUNCTION widestRange% (first AS INTEGER, last AS INTEGER)
DECLARE FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER)

TYPE cMapType
  r AS INTEGER
  g AS INTEGER
  b AS INTEGER
  dist AS INTEGER
END TYPE

TYPE ditherType
  r AS INTEGER
  g AS INTEGER
  b AS INTEGER
END TYPE

TYPE cPxlType
  rgb AS INTEGER
  count AS INTEGER
END TYPE

TYPE bucketType
  first AS INTEGER
  items AS INTEGER
  order AS INTEGER
END TYPE

DIM SHARED cMap(255) AS cMapType
DIM SHARED cList(7999) AS cPxlType, cCount AS INTEGER
DIM SHARED bucket(256) AS bucketType, numBuckets AS INTEGER
DIM r AS INTEGER, g AS INTEGER, b AS INTEGER
DIM eR AS INTEGER, eG AS INTEGER, eB AS INTEGER
DIM dR AS INTEGER, dG AS INTEGER, dB AS INTEGER
DIM scan AS STRING, first AS INTEGER, last AS INTEGER, rgb AS INTEGER
DIM i AS INTEGER, j AS INTEGER, x AS INTEGER, y AS INTEGER
DIM temp AS INTEGER, median AS INTEGER

REDIM dither(0) AS ditherType
DIM ditherOfs AS INTEGER, ditherLen AS INTEGER, imgW AS INTEGER

' read scanline
scan = getScanLine$("SAMPLE.TGA")
imgW = LEN(scanLine) \ 3
ditherLen = imgW + 1
REDIM dither(ditherLen - 1) AS ditherType
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each)
    b = ASC(MID$(scan, x, 1))
    g = ASC(MID$(scan, x + 1, 1))
    r = ASC(MID$(scan, x + 2, 1))

    ' convert to 15-bit RGB
    rgb = ((b \ 8) * 1024) + ((g \ 8) * 32) + (r \ 8)

    ' add color to cList() and update .count
    addColor rgb
  NEXT x

  ' grab next scanline
  scan = getScanLine$("")
WEND

' initial bucket
bucket(0).first = 0
bucket(0).items = cCount
bucket(0).order = -1
numBuckets = 1

' generate 256 buckets
WHILE (numBuckets < 256)

  ' split buckets
  FOR i = 0 TO numBuckets - 1
    first = bucket(i).first
    last = bucket(i).first + bucket(i).items - 1

    ' sort bucket colors according to widest range if necessary
    temp = widestRange%(first, last)
    IF (bucket(i).order <> temp) THEN
      sortList first, last, temp
      bucket(i).order = temp
    END IF

    ' median split (equal number of colors in either buckets)
    median = bucket(i).items \ 2

    ' spill colors over new bucket
    bucket(numBuckets).first = first + median + 1
    bucket(numBuckets).items = bucket(i).items - (median + 1)
    bucket(numBuckets).order = bucket(i).order

    ' update current bucket
    bucket(i).items = median + 1

    ' add one bucket to the list
    numBuckets = numBuckets + 1
  NEXT i
WEND

' write color map
writeCMap

' enter mode 13 (320x200, 256 colors)
SCREEN 13

' set up palette (VGA card only allows 6 bits per channel by default)
OUT &H3C6, &HFF
OUT &H3C8, 0
FOR i = 0 TO UBOUND(cMap)
  OUT &H3C9, cMap(i).r \ 4
  OUT &H3C9, cMap(i).g \ 4
  OUT &H3C9, cMap(i).b \ 4
NEXT i

' read the image again, for display this time
scan = getScanLine$("SAMPLE.TGA")
WHILE LEN(scan)
  ' process whole scanline, one pixel at a time
  FOR x = 1 TO LEN(scan) STEP 3

    ' dissocate channels (8 bits each) and add error
    b = ASC(MID$(scan, x, 1)) + dither(ditherOfs).b
    g = ASC(MID$(scan, x + 1, 1)) + dither(ditherOfs).g
    r = ASC(MID$(scan, x + 2, 1)) + dither(ditherOfs).r

    ' keep values in 0-255 range
    IF (r < 0) THEN r = 0 ELSE IF (r > 255) THEN r = 255
    IF (g < 0) THEN g = 0 ELSE IF (g > 255) THEN g = 255
    IF (b < 0) THEN b = 0 ELSE IF (b > 255) THEN b = 255

    ' get nearest color available
    near = nearest%(r, g, b)

    ' draw pixel
    PSET ((x - 1) \ 3, y), near

    ' we processed the error for the current pixel, clear it
    dither(ditherOfs).r = 0
    dither(ditherOfs).g = 0
    dither(ditherOfs).b = 0

    ' get the difference between the ideal (+error) and substitution color
    eR = r - cMap(near).r
    eG = g - cMap(near).g
    eB = b - cMap(near).b

    ' diffuse error to nearby pixels:
    ' 7 parts (each part is 1/16,) 9 parts remain
    dR = ((eR / 16) * 7)
    dG = ((eG / 16) * 7)
    dB = ((eB / 16) * 7)
    ' right of current pixel
    temp = (ditherOfs + 1) MOD ditherLen
    dither(temp).r = dither(temp).r + dR
    dither(temp).g = dither(temp).g + dG
    dither(temp).b = dither(temp).b + dB

    ' 3 parts, 6 parts remain
    dR = ((eR / 16) * 3)
    dG = ((eG / 16) * 3)
    dB = ((eB / 16) * 3)
    ' down and left
    temp = (ditherOfs + imgW - 1) MOD ditherLen
    dither(temp).r = dither(temp).r + dR
    dither(temp).g = dither(temp).g + dG
    dither(temp).b = dither(temp).b + dB

    ' 5 parts, 1 part remain
    dR = ((eR / 16) * 5)
    dG = ((eG / 16) * 5)
    dB = ((eB / 16) * 5)
    ' down
    temp = (ditherOfs + imgW) MOD ditherLen
    dither(temp).r = dither(temp).r + dR
    dither(temp).g = dither(temp).g + dG
    dither(temp).b = dither(temp).b + dB

    ' 1 part, the error is now fully distributed
    dR = ((eR / 16) * 1)
    dG = ((eG / 16) * 1)
    dB = ((eB / 16) * 1)
    ' down and right
    temp = (ditherOfs + imgW + 1) MOD ditherLen
    dither(temp).r = dither(temp).r + dR
    dither(temp).g = dither(temp).g + dG
    dither(temp).b = dither(temp).b + dB

    ' advance dithering buffer offset
    ditherOfs = (ditherOfs + 1) MOD ditherLen
  NEXT x
  y = y + 1

  ' grab next scanline
  scan = getScanLine$("")
WEND

''
'' Insert/count RGB colors in cList(). First add up R G and B, and use the
'' result (a number between 0 and 3x 31) as a hash. The hash points to an
'' element in gIndex(). Each element in gIndex is a string containing offsets
'' to cList() items, ordered by increasing RGB value. If RGB is already stored
'' it must be in one of those offsets. If it's not, the RGB value is added to
'' cList() and the index at which it was stored is placed into the proper
'' gIndex().
''
SUB addColor (rgb AS INTEGER) STATIC
  DIM gIndex(93) AS STRING, gHash AS INTEGER
  DIM loBound AS INTEGER, hiBound AS INTEGER
  DIM middle AS INTEGER, insert AS INTEGER
  DIM ofst AS INTEGER

  ' get RGB hash
  gHash = (rgb \ 1024) + ((rgb \ 32) AND &H1F) + (rgb AND &H1F)

  ' search gIndex(gHash) for indices in cList()
  insert = 0
  loBound = 0
  hiBound = LEN(gIndex(gHash)) \ 2 - 1
  DO WHILE (loBound <= hiBound)
    middle = (loBound + hiBound) \ 2
    ofst = CVI(MID$(gIndex(gHash), 1 + middle * 2, 2))
    SELECT CASE cList(ofst).rgb
    CASE IS < rgb
      loBound = middle + 1
      insert = loBound
    CASE IS > rgb
      hiBound = middle - 1
      insert = middle
    CASE ELSE
      cList(ofst).count = cList(ofst).count + 1
      EXIT SUB
    END SELECT
  LOOP

  ' save cList() offset to dictionary
  gIndex(gHash) = LEFT$(gIndex(gHash), insert * 2) + MKI$(cCount) + _
  RIGHT$(gIndex(gHash), LEN(gIndex(gHash)) - (insert * 2))

  ' save color to cList()
  cList(cCount).rgb = rgb
  cList(cCount).count = 1
  cCount = cCount + 1
END SUB

''
'' return index of nearest color in cMap(), luminosity-based
''
FUNCTION nearest% (r AS INTEGER, g AS INTEGER, b AS INTEGER) STATIC
  DIM loDist AS INTEGER, hiDist AS INTEGER, dist AS INTEGER, temp AS INTEGER
  DIM loBound AS INTEGER, hiBound AS INTEGER, middle AS INTEGER, i AS INTEGER
  DIM loPoint AS INTEGER

  loDist = (r + g + b) - 8

  loBound = 0
  hiBound = 255
  DO WHILE (loBound <= hiBound)
    middle = (loBound + hiBound) \ 2
    SELECT CASE cMap(middle).dist
    CASE IS < loDist
      loBound = middle + 1
      loPoint = loBound
    CASE IS > loDist
      hiBound = middle - 1
      loPoint = middle
    CASE ELSE
      loPoint = middle
      EXIT DO
    END SELECT
  LOOP

  IF (loPoint > 255) THEN
    nearest% = 255
    EXIT FUNCTION
  END IF

  hiDist = loDist + 16
  IF (hiDist > 765) THEN hiDist = 765

  dist = 20000
  FOR i = loPoint TO UBOUND(cMap)
    temp = ABS(r - cMap(i).r) + ABS(g - cMap(i).g) + ABS(b - cMap(i).b)
    IF (temp < dist) THEN
      dist = temp
      nearest% = i
    END IF
    ' test now so we at least have a color to return
    IF (cMap(i).dist > hiDist) THEN EXIT FOR
  NEXT i
END FUNCTION

''
'' The Median Cut algorithm spends most of its time sorting colors. Make this
'' routine as fast as possible. Here, we're using the QuickSort algorithm to
'' order entries between [first] and [last] in cList() according to the
'' desired channel (1: red, 32: green, 1024: blue.) QuickSort's only weakness
'' is already sorted arrays, so do not call this routine if cList() is already
'' in the proper order.
''
SUB sortList (first AS INTEGER, last AS INTEGER, channel AS INTEGER) STATIC
  DIM lstack(255) AS INTEGER, stkRef AS INTEGER, midRef AS INTEGER
  DIM lo AS INTEGER, hi AS INTEGER, lo2 AS INTEGER, hi2 AS INTEGER

  lstack(0) = first
  lstack(1) = last
  stkRef = 2
  DO
    stkRef = stkRef - 2
    lo = lstack(stkRef)
    hi = lstack(stkRef + 1)
    DO
      lo2 = lo: hi2 = hi
      midRef = (cList((lo + hi) \ 2).rgb \ channel) AND &H1F
      DO
        DO WHILE (((cList(lo2).rgb \ channel) AND &H1F) < midRef)
          lo2 = lo2 + 1
        LOOP
        DO WHILE (((cList(hi2).rgb \ channel) AND &H1F) > midRef)
          hi2 = hi2 - 1
        LOOP
        IF (lo2 <= hi2) THEN
          SWAP cList(lo2), cList(hi2)
          lo2 = lo2 + 1
          hi2 = hi2 - 1
        END IF
      LOOP WHILE (lo2 <= hi2)

      IF (hi2 - lo) < (hi - lo2) THEN
        IF (lo2 < hi) THEN
          lstack(stkRef) = lo2
          lstack(stkRef + 1) = hi
          stkRef = stkRef + 2
        END IF
        hi = hi2
      ELSE
        IF (lo < hi2) THEN
          lstack(stkRef) = lo
          lstack(stkRef + 1) = hi2
          stkRef = stkRef + 2
        END IF
        lo = lo2
      END IF
    LOOP WHILE (lo < hi)
  LOOP WHILE stkRef
END SUB

''
'' find the channel with the highest range between entries [first] and [last]
'' in cList()
''
FUNCTION widestRange% (first AS INTEGER, last AS INTEGER) STATIC
  DIM r AS INTEGER, minR AS INTEGER, maxR AS INTEGER
  DIM g AS INTEGER, minG AS INTEGER, maxG AS INTEGER
  DIM b AS INTEGER, minB AS INTEGER, maxB AS INTEGER
  DIM i AS INTEGER, rgb AS INTEGER

  ' initialize each channel's lower and upper bounds
  minR = 255: maxR = 0
  minG = 255: maxG = 0
  minB = 255: maxB = 0

  ' update each channel's minimal and maximal values
  FOR i = first TO last
    rgb = cList(i).rgb

    r = (rgb AND &H1F)
    g = ((rgb \ 32) AND &H1F)
    b = (rgb \ 1024)

    IF (r > maxR) THEN maxR = r
    IF (r < minR) THEN minR = r
    IF (g > maxG) THEN maxG = g
    IF (g < minG) THEN minG = g
    IF (b > maxB) THEN maxB = b
    IF (b < minB) THEN minB = b
  NEXT i

  ' get range/ampliture of each channel
  r = maxR - minR
  g = maxG - minG
  b = maxB - minB

  ' return the channel with the widest range (1: red, 32: green, 1024: blue)
  IF (r >= g) AND (r >= b) THEN
    widestRange% = 1
  ELSEIF (g >= r) AND (g >= b) THEN
    widestRange% = 32
  ELSE
    widestRange% = 1024
  END IF
END FUNCTION

SUB writeCMap
  DIM count AS INTEGER, offset AS INTEGER
  DIM max AS INTEGER, hiRef AS INTEGER
  DIM i AS INTEGER, j AS INTEGER
  DIM r AS LONG, g AS LONG, b AS LONG

  ' write colormap
  FOR j = 0 TO numBuckets - 1
    IF (bucket(j).items) THEN
      ' average bucket colors
      r = 0: g = 0: b = 0
      count = 0
      FOR i = bucket(j).first TO bucket(j).first + bucket(j).items - 1
        r = r + CLNG((cList(i).rgb) AND &H1F) * cList(i).count
        g = g + CLNG((cList(i).rgb \ 32) AND &H1F) * cList(i).count
        b = b + CLNG((cList(i).rgb \ 1024) AND &H1F) * cList(i).count
        count = count + cList(i).count
      NEXT i
      r = CINT((r / count) * 8)
      g = CINT((g / count) * 8)
      b = CINT((b / count) * 8)
    ELSE
      r = 255
      g = 0
      b = 255
    END IF

    ' write to colormap array, create compute taxicab distance
    cMap(j).r = r
    cMap(j).g = g
    cMap(j).b = b
    cMap(j).dist = cMap(j).r + cMap(j).g + cMap(j).b
  NEXT j

  ' sort colors by distance
  count = UBOUND(cMap) - LBOUND(cMap) + 1
  offset = count \ 2
  DO WHILE offset
    max = count - offset - 1
    DO
      hiRef = 0
      FOR i = 0 TO max
        IF (cMap(i).dist > cMap(i + offset).dist) THEN
          SWAP cMap(i), cMap(i + offset)
          hiRef = i
        END IF
      NEXT i
      max = hiRef - offset
    LOOP WHILE hiRef
    offset = offset \ 2
  LOOP
END SUB

Here's the image obtained with the Median Cut Algorithm without (left) and with (right) dithering. While the filter introduces graininess in some places, it also produces smoother shading on walls and large uniform-ish surfaces. I have to admit part of the error is due to the discrepancy between the colors as they are shown (6-bit per channel, due to VGA limitations) and the way they are described by the cMap() array (8-bit per channel.) The image on the right was generated using the slow color to colormap index conversion.

This time, the picture on the right was rendered using the fast nearest color function. Because it prioritizes brightness over color composition, it produces even smoother results in some places (the walls, the dark areas near the armchair and the cyan panels outside) but creates weird mosaic patchworks in others (like the brown couch.)

Dithering can ease transition between colors (stuff like gray and white clouds in the sky) and can manually be applied here and there to introduce textures. As an automated filter, dithering really shines with limited color palettes.

On the left is the image quantized with the uniform palette technique from earlier (61 colors.) On the right is the same image with a layer of Floyd-Steinberg dithering on top (it manages to use 20 more colors in the existing colormap.) If it doesn't look better than the full 256-color palette we generated in the previous example, it nonetheless provides a much needed visual boost to otherwise cheaply converted images.

Some more links on the topic

In case you missed it, here's the link to Paul Heckbert's original paper explaining the Median Cut Algorithm and the Popularity Algorithm. It also outlines a fast RGB to colormap index conversion algorithm. For the alternative implementation of the Median Cut Algorithm that focuses on average channel repartition, it can be found in Dr. Dobb's Journal (release 12 from 1994.)

For more info about dithering, check out Tanner Helland's excellent article "Image Dithering: Eleven Algorithms" which describes 1-bit per pixel dithering (it's very well explained,) and Preet Shihn's "Reducing Colors In An Image" for more color examples and in-depth explanations.