Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Jun 24 09:24:24 CEST 2013

```An important question here is whether you want to notice
when a Roman numeral is invalid, e.g., iix, or not.

From a parsing point of view context-free grammars are not
ideal.  We have the patterns
i{1,3} | iv | vi{1,3} | ix		units
x{1,3} | xl | lx{1,3} | xc		tens
c{1,3} | cd | dc{1,3} | cm		hundreds
m{1,3} | mↁ | ↁm{1,3} | mↂ		thousands
so we want to write a grammar rule like

roman(N) -->
group('m', 'ↁ', 'ↂ', M),
group('c', 'd', 'm', C),
group('x', 'l', 'c', X),
group('i', 'v', 'x', I),
{N is M*1000 + C*100 + X*10 + I}.

group(U, F, T, N) -->
( [U,T]     -> {N = 9}
; [U,F]     -> {N = 4}
; [U,U,U]   -> {N = 3}
; [U,U]     -> {N = 2}
; [U]       -> {N = 1}
; [F,U,U,U] -> {N = 8}
; [F,U,U]   -> {N = 7}
; [F,U]     -> {N = 6}
; [F]       -> {N = 5}
).

using DCG notation.  This is not context-free.  It is an
attribute grammar with attributes passed down the parse
tree (U, F, T) and passed up the parse tree (N).

Parser combinators are the easy way to do this in Haskell;
see 'parsec' or any number of parser combinator tutorials.

For that matter, it's pretty trivial to do this particular
one with plain code that pretty much mirrors the structure
shown above.  Just write something that takes (as many
arguments as you want) then a list of characters coming in
and returns a pair with a computed answer and the remaining
characters.  But wait!  That _is_ a parser combinator!
I guess parser combinators can't be that hard after all (:-).

Good luck finding the five thousand character (ↁ) or the
ten thousand character (ↂ) or any of the other Unicode
"I'm not a letter, no, honest, I'm really not, I'm a
Roman numeral" characters on your keyboard.

```