Arabic to Roman numerals (and back)

When in Rome...

Imagine you're stuck in an elevator with an evil giraffe that threatens to blow up Neptune in thirty seven minutes unless you write "174" in Roman numerals, and all you have at your disposal is an old computer with QuickBASIC installed. I know it sounds super far-fetched, but it CAN happen... listen, I'm doing my best to get you invested in the topic at hand, cut me some slack. Anyway, as we have all forgotten (I had to look it up,) here's what Roman numerals look like:

I 1
V 5
X 10
L 50
C 100
D 500
M 1,000

Roman representation is "most significant value first" in the sense that the first numeral (leftmost) has more weight to the value than the last numeral (rightmost.) It's identical to the arabic system we use: for instance, in 109 (a hundred and nine,) the leftmost "1" weights more than the rightmost "9."

The system uses two types of logic: additive and subtractive. When values (read right to left) are increasing (or repeated) numerals, they are added. For instance: XVII is 17, or 1 + 1 + 5 + 10. Note that repeated numerals may only be powers of ten (I, X, C, and M,) and only up to three times. "Ackchually, Wikipedia cites a moderator's friend's blog that claims someone once wrote XXXX and refused to admit it was a mistake, therefore..." frankly my dear I don't give a damn what Wikipedia's contrarian opinion about anything is. I don't care.

When values (again, right to left) are decreasing numerals, they are subtracted. For instance: IX is 9, or 10 - 1. Another subtlety here is that we may only subtract one numeral, it may only be a power of ten (so again, I, X, C, or M,) and it has to be within two ranks of the value it subtracts: X can be subtracted from L and C, but not from D or M. Similarly, I may be subtracted from V and X, but not L and up.

It sounds absolutely weird, but it's a fairly decent way to transcribe finger-counting.

Arabic to Roman

In practice, the smallest value we may represent in Roman numerals is 1 (I) and the largest is 3,999 (MMMCMXCIX.) There's technically a way to write values equal to or greater than 4,000 using parenthesis, however that's a very niche usage and I can't see anyone using Roman numerals for anything but years and chapter enumerations nowadays... Right now, what we really want is a way to easily convert digits from Arabic numerals to Roman numerals. Let's begin with values 1 to 9.

   I 1
  II 2
 III 3
  IV 4
   V 5
  VI 6
 VII 7
VIII 8
  IX 9

How does it work exactly? Well, 1, 2, and 3 are obtained by repeating "I" up to 3 times. 5 is easy as we just write "V." 6, 7, and 8 are obtained by starting at 5, and then repeating "I" up to 3 times. We're left with two numbers: 4 and 9. 4 is written "IV," meaning that we take 5 and subtract 1. 9 is written "IX," meaning that we take 10 and subtract 1. If we had to port that to QuickBASIC, we would write something along the lines of:

DIM outStr AS STRING
DIM digit AS INTEGER

SELECT CASE (digit)
CASE 1 TO 3
  outStr = outStr + STRING$("I", digit) ' Repeat I from 1 to 3 times
CASE 4
  outStr = outStr + "IV" ' Write as-is for now
CASE 5 TO 8
  outStr = outStr + "V" + STRING$("I", digit - 5) ' Repeat I from 0 to 3 times
CASE 9
  outStr = outStr + "IX"
END SELECT

The code above will work for any value between 1 and 9. But what if we want to go further? First, we're going to write the entirety of the Roman numerals into a string by increasing value:

CONST romNum = "IVXLCDM"

We notice that every other character is a power of ten: I is 1, X is 10, C is 100, M is 1,000. Similarly, V is 5, L is 50, etc. So every time we increment our reading offset in the Roman numeral list by two, we basically multiply our Arabic numeral by 10: 4 is written "IV" (4 * 10^0 = 4) if we start reading romNum at 1, it becomes "XL" (4 * 10^1 = 40) if we start reading romNum at 3, and becomes "CD" (4 * 10^2 = 400) if we start reading romNum at 5. So, if we can isolate each decimal in Arabic representation and determine their rank, then we could shift our reading offset accordingly!

DIM srcNum AS INTEGER
DIM rank AS INTEGER

srcNum = 109

' Keep going until we have no more digits to process.
WHILE (srcNum)
  PRINT "Rank:"; rank, (srcNum MOD 10) ' Print rightmost decimal and its rank.
  srcNum = srcNum \ 10                 ' Shift digits once to the right.
  rank = rank + 1                      ' Increment rank.
WEND

' The output is:
' Rank: 0   9
' Rank: 1   0
' Rank: 2   1

Neat! When we combine the reading offset trick and the decimal isolation, here's what we get:

''
'' CONVERT ARABIC NUMERALS TO ROMAN NUMERALS
''
FUNCTION ArabicToRoman$ (value AS INTEGER)
  CONST romNum = "IVXLCDM"

  DIM srcNum AS INTEGER, digit AS INTEGER
  DIM rank AS INTEGER, romOfs AS INTEGER
  DIM outStr AS STRING

  ' This implementation only works for values between 1 and 3,999.
  IF ((value < 1) OR (value > 3999)) THEN
    EXIT FUNCTION
  END IF

  ' Backup input.
  srcNum = value

  ' Keep going until we have no more digit to process.
  WHILE (srcNum)
    digit = srcNum MOD 10   ' Isolate rightmost decimal.
    romOfs = 1 + (rank * 2) ' Reading offset in romNum.

    SELECT CASE (digit)
    CASE 1 TO 3
      ' Repeat base up to three times.
      outStr = STRING$(digit, MID$(romNum, romOfs, 1)) + outStr
    CASE 4
      ' Next multiple of five, decrement with base.
      outStr = MID$(romNum, romOfs, 1) + MID$(romNum, romOfs + 1, 1) + outStr
    CASE 5 TO 8
      ' Next multiple of five, repeat base between 0 and 3 times.
      outStr = MID$(romNum, romOfs + 1, 1) + STRING$(digit - 5, MID$(romNum, romOfs, 1)) + outStr
    CASE 9
      ' Next power of ten, decrement with base.
      outStr = MID$(romNum, romOfs, 1) + MID$(romNum, romOfs + 2, 1) + outStr
    END SELECT

    srcNum = srcNum \ 10 ' Shift digits to the right, once.
    rank = rank + 1      ' Increment rank.
  WEND

  ' Return string.
  ArabicToRoman$ = outStr
END FUNCTION

Thus, "174" is "CLXXIV" (5 - 1 + 10 + 10 + 50 + 100!) Take that, you evil giraffe!

Roman to Arabic

Believe it or not, converting Roman to Arabic numerals is about as easy! In practice, all we have to do is read the string right to left, convert the Roman notation to a value and compare with the previous value we processed. If the current value is smaller than the previous value, we decrement the count. If the current value is greater or equal to the previous value, we increment the count.

''
'' CONVERT ROMAN NUMERALS TO ARABIC NUMERALS (version 1)
''
FUNCTION RomanToArabic% (source AS STRING)
  CONST romNum = "IVXLCDM"

  DIM value AS INTEGER
  DIM thisInt AS INTEGER, prevInt AS INTEGER
  DIM thisOfs AS INTEGER

  FOR i% = LEN(source) TO 1 STEP -1

    ' Find the Roman numeral in the list
    thisOfs = INSTR(romNum, MID$(source, i%, 1)) - 1
    IF (thisOfs = -1) THEN
      value = -1
      EXIT FOR
    END IF

    ' Even indices are powers of ten, odd indices are also multiplied by 5.
    IF (thisOfs AND &H1) THEN
      thisInt = 5 * 10 ^ (thisOfs \ 2)
    ELSE
      thisInt = 10 ^ (thisOfs \ 2)
    END IF

    ' Increment or decrement, according to previous value.
    IF (thisInt < prevInt) THEN
      value = value - thisInt
    ELSE
      value = value + thisInt
    END IF
    thisInt = thisInt
  NEXT i%

  ' Return value
  RomanToArabic% = value
END FUNCTION

And that's it... well, it is as long as we're not interested in string validation. The function above will decode incorrect inputs such as IIII (4,) VV (10,) or IXIX (18.) Some rules should be put in place to prevent that behavior. The following function should catch invalid strings (in which case it returns -1.)

''
'' CONVERT ROMAN NUMERALS TO ARABIC NUMERALS (version 2)
''
FUNCTION RomanToArabic% (source AS STRING)
  CONST romNum = "IVXLCDM"

  DIM value AS INTEGER
  DIM thisInt AS INTEGER, prevInt AS INTEGER
  DIM thisOfs AS INTEGER, prevOfs AS INTEGER
  DIM streak AS INTEGER, didSub AS INTEGER

  prevOfs = -1

  ' Assume the string is invalid.
  RomanToArabic% = -1

  FOR i% = LEN(source) TO 1 STEP -1

    ' Find the Roman numeral in the list
    thisOfs = INSTR(romNum, MID$(source, i%, 1)) - 1
    ' INVALID: not a Roman numeral.
    IF (thisOfs = -1) THEN EXIT FUNCTION
    IF (thisOfs = prevOfs) THEN
      streak = streak + 1
      ' INVALID: more than 3 times the same Roman numeral in a row.
      IF (streak > 3) THEN EXIT FUNCTION
      ' INVALID: V, L, D twice in a row.
      IF ((streak > 1) AND (thisOfs AND &H1)) THEN EXIT FUNCTION
    ELSE
      streak = 1
    END IF

    ' Even indices are powers of ten, odd indices are also multiplied by 5.
    IF (thisOfs AND &H1) THEN
      thisInt = 5 * 10 ^ (thisOfs \ 2)
    ELSE
      thisInt = 10 ^ (thisOfs \ 2)
    END IF

    ' Increment or decrement, according to previous value.
    IF (thisInt < prevInt) THEN
      ' INVALID: we already subtracted.
      IF (didSub) THEN EXIT FUNCTION
      ' INVALID: trying to subtract Roman numeral V, L, or D.
      IF (thisOfs AND &H1) THEN EXIT FUNCTION
      ' INVALID: subtractor must be within two ranks.
      IF ((prevOfs - thisOfs) > 2) THEN EXIT FUNCTION
      value = value - thisInt
      didSub = -1
    ELSE
      value = value + thisInt
      prevInt = thisInt
      didSub = 0
    END IF

    prevOfs = thisOfs
  NEXT i%

  ' Return value
  RomanToArabic% = value
END FUNCTION

That's it. It's sloppy, but it works.