Wanted: warning option for usages of unary minus

Isaac Dupree isaacdupree at charter.net
Thu Apr 12 17:46:10 EDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simon Marlow wrote:
>> So does this suggest that under a negation-is-part-of-numeric-token
>> regime, 123-456 should be two tokens (a positive number then a negative
>> number, here), as is signum-456 ...
> 
> Yes, absolutely.

[see note 1 at the end responding irrelevantly to that]

Okay, here we go with the through descriptions...

Warn about any "-" that precedes without spaces a numeric literal, that
is not an application of "negate" to that literal.  This includes when
it's infix (n-1) and when it's out-precedenced (-2^6).  ===> A file that
does not trigger this warning is safe to have negative numeric literals
added to the syntax / lexer. [see Note 2 at the end about how commonly
this warning might occur in practice]

Warn about any "-" that DOES NOT precede-without-spaces a numeric
literal, that nonetheless means negate.  ===> A file that triggers
neither this nor the previous warning is safe to have negative numeric
literals added AND interpretation of unqualified operator "-" as negate
removed.


"Reverse" warnings, for those who want to take advantage of negative
numeric literal syntax and then possibly convert to Haskell98 syntax easily:
If a "-" isn't followed immediately by a numeric literal, the only thing
to watch out for (and warn about) is the "forbidden section" (- 1),
which could mean an actual section (\x -> x - 1) in the "new" syntax.

For actual negative literals: warn when literal is the left-hand-side an
infix expression with relevant precedence ((> 6, which changes program
behaviour) or (= 6 and not left-associative, which causes a parse
error)). (being on the right-hand side, e.g. (x ^^ -1) is completely
unambiguous, and expressions like (-1 + 2) mean the same thing either
way).  Also, warn if the literal is part of a function application:
either it would become infix in '98 syntax (e.g. (signum -2)) or just
negate multiple things to the right (e.g. (-1 foo)) (some of these are
type errors assuming (->) isn't made an instance of Num, but that's a
later stage in the compilation process).


Should we allow "positive numeric literals" +37 as well, for symmetry,
so we can also break (n+1) as well as (n-1)? (and also break (+1), which
is actually an asymmetric problem since that isn't a section in the
first place in Haskell98)



Implementation notes:

I haven't looked at the part of GHC's code where it deals with fixity
resolution yet, but I'm concerned that GHC might throw away information
about where parentheses were in the original code at the same time -
which is important information for determining whether some of the
warnings are valid, it seems.

For the purpose of warnings, I would explicitly keep track, for
unqualified operator "-", whether it was followed by a digit (which is
the unique and certain determiner that a numeric literal follows. Octal
and hexadecimal start with 0c for some "c" and floating-point always
starts with a decimal digit).  This would probably involve adding an
argument isomorphic to Bool to the constructor "ITminus".  Then in
compiler/parser/Lexer.x just before the @varsym rule (since alex is
first maximal-munch, then top-to-bottom in the .x file, in matching
choice), add rules
  "-" / [0-9]     {  minus followed by number  }
  "-"             {  minus not followed by number  }
( the [0-9] pattern could be refined perhaps... )
Then this notation has to be carried on through the Parser.y, which
shouldn't be too hard.

For negative numeric literals, I think extra rules in the lexer would be
added, '-' followed by the various numeric literal types (this seems a
little repetitious, is there an easier way?).  The varieties of literals
that were standard in the first place (i.e. non-unboxed) will get " / {
extension is on }" qualifications to their patterns.  mkHsNegApp (in
RdrHsSyn.lhs) will be simplified or removed, since we are moving towards
a more sensible treatment of negative literals.  Another implementation
choice could be to recognize the "minus followed by number" in the
parser, but then it might be hard to distinguish between '98-syntax
negate, subtraction, and negative unboxed literals, without ambiguity in
the parser?

(Negative) numeric literals can occur in patterns, not just expressions;
that may or may not need tweaks specific to it.

Test cases!!!! I suppose I should make a bunch of them, that deal with
every oddity I can think of, since I have already been thinking about
them... (1 Prelude.-1) is infix with either syntax, and shouldn't
(probably) be warned about, etc., etc. -- which explain better what the
intended behaviour is anyway.



Note 1: I happen to think it's silly to allow two such tokens such that
one begins at the same character-location that the previous one ends,
but that's clearly a completely separate issue. I have been bitten by
- -fglasgow-exts and x$y z (template haskell syntax $identifier, which is
rather similar to the proposed negative literal syntax) before; maybe I
don't even want infix operators adjacent to identifiers normally! (but
in practice everything tends to work out without difficulty)

Note 2: looking through the results for http://www.google.com/codesearch for
lang:haskell [0-9a-zA-Z_'#)]-[0-9]
suggests that expressions like (n-1) without spaces are mildly popular.
I wouldn't trust the "number of results" though, because (1) results in
comments are included, (2) who knows what code it's searching, and (3)
searching for
lang:haskell [-][0-9]
gave me fewer results than the more restrictive
lang:haskell [^0-9a-zA-Z_'#)]-[0-9]
.  The "#" was included in case there were glasgowIdentifiers#, and the
rest of the symbols could have been useful if *&$%- didn't make one
infix operator.



Feeling excessively thorough,
Isaac

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGHqihHgcxvIWYTTURAk54AJ9rsqBgu1kKJqudazzuBm6u5WujiACg2f1Y
sTrl1AZrHXxzMtnpez6OSEY=
=ktjn
-----END PGP SIGNATURE-----


More information about the Glasgow-haskell-users mailing list