simple Haskell help needed on #19746

Richard Eisenberg rae at richarde.dev
Tue Apr 27 19:04:16 UTC 2021


Hi devs,

tl;dr: Is there any (efficient) way to get the String consumed by a `reads`?

I'm stuck in thinking about a fix for #19746. Happily, the problem is simple enough that I could assign it in the first few weeks of a Haskell course... and yet I can't find a good solution! So I pose it here for inspiration.

The high-level problem: Assign correct source spans to options within a OPTIONS_GHC pragma.

Current approach: The payload of an OPTIONS_GHC pragma gets turned into a String and then processed by GHC.Utils.Misc.toArgs :: String -> Either String [String]. The result of toArgs is either an error string (the Left result) or a list of lexed options (the Right result).

A little-known fact is that Haskell strings can be put in a OPTIONS_GHC pragma. So I can write both {-# OPTIONS_GHC -funbox-strict-fields #-} and {-# OPTIONS_GHC "-funbox-strict-fieds" #-}. Even stranger, I can write {-# OPTIONS_GHC ["-funbox-strict-fields"] #-}, where GHC will understand a list of strings. While I don't really understand the motivation for this last feature (I posted #19750 about this), the middle option, with the quotes, seems like it might be useful.

Desired approach: change toArgs to have this type: RealSrcLoc -> String -> Either String [Located String], where the input RealSrcLoc is the location of the first character of the input String. Then, as toArgs processes the input, it advances the RealSrcLoc (with advanceSrcLoc), allowing us to create correct SrcSpans for each String.

Annoying fact: Not all characters advance the source location by one character. Tabs and newlines don't. Perhaps some other characters don't, too.

Central stumbling block: toArgs uses `reads` to parse strings. This makes great sense, because `reads` already knows how to convert Haskell String syntax into a proper String. The problem is that we have no idea what characters were consumed by `reads`. And, short of looking at the length of the remainder string in `reads` and comparing it to the length of the input string, there seems to be no way to recreate this lost information. Note that comparing lengths is slow, because we're dealing with Strings here. Once we know what was consumed by `reads`, then we can just repeatedly call advancedSrcLoc, and away we go.

Ideas to get unblocked:
1. Just do the slow (quadratic in the number of options) thing, looking at the lengths of strings often.
2. Reimplement reading of strings to return both the result and the characters consumed
3. Incorporate the parsing of OPTIONS_GHC right into the lexer

It boggles me that there isn't a better solution here. Do you see one?

Thanks,
Richard


More information about the ghc-devs mailing list