digit groups

Jon Fairbairn jon.fairbairn at cl.cam.ac.uk
Thu Oct 26 10:02:44 EDT 2006

On 2006-10-25 at 20:57-0000 Aaron Denney wrote:
> On 2006-10-25, Jon Fairbairn <jon.fairbairn at cl.cam.ac.uk> wrote:
> > No. A small alteration to the lexical syntax for the sake of
> > improved readability seems perfectly justifiable as long as
> > it doesn't make the lexical syntax /significantly/ more
> > complicated or harder to learn.
> Sure.  But some of us don't find it terribly readable. 

I'm not sure what you are saying here. Assessing readability
by introspection is terribly unreliable. Unfamiliarity with
the presentation of numbers with underlines is likely to
make them feel a bit awkward to begin with, but habituation
is likely to change that.  We do know from venerable
experiments that humans can easily identify small groups of
things without counting. Most people can recognise three
easily, few people can recognise eight. So it's no surprise
that the standard presentation of numbers groups digits in
threes. If you were to conduct an experiment on yourself
that presented you with numbers displayed in all three forms
(ungrouped, thin spaced and with underlines) and timed how
long it took you to read them out, I'd be surprised if the
underline grouped form (even while still unfamiliar) didn't
beat the ungrouped form. Quickly now, is 20000000000 tens of
millions, tens or hundreds orthousands of millions? Now try
the same for 2_000_000_000 or 20_000_000_000.

> I think the ~~ operator hack gets 90% of the "benefit" for
> those who want it.

I thought my earlier message adequately demonstrated that it
does /not/. Another case: if you change “square
123479010987” to “square 123_479_010_987” to improve
readability it still means the same thing. If you change it
to “square 123~~479~~010~~987” it doesn't.

> > although my preference would be something a bit more
> > restrictive, requiring numbers to have groups of the same
> > number of digits after each “_” and beginning with a shorter
> > group (ie 12_000_000 and 1200_0000 would be valid but
> > 1247_000 would not). I'm not wedded to this requirement (and
> > it would take a more sophisticated grammar to formalise).
> The only reason to put it in the lexer/parser is to avoid
> misleading cases,


> which needs thas additional restriction, or something
> similar, like always 3 for decimal, 4 for hex, 3 for oct,
> or whatever.

No. I certainly would prefer a requirement that the groups
be the same length, but the intention is that the value
would be got simply by stripping out the underlines. So
while 19_00 would be an idiosyncratic way of writing 1_900
(intended to be read nineteen hundred, one would presume),
it wouldn't be misleading in the way that 19~~00 (which
would evaluate to 19_000) would be.

Jón Fairbairn                              Jon.Fairbairn at cl.cam.ac.uk

More information about the Haskell-prime mailing list