simple Haskell help needed on #19746

Sebastian Graf sgraf1337 at gmail.com
Tue Apr 27 19:32:57 UTC 2021


Hi Richard,

Maybe I lack a bit of context, but I don't see why you wouldn't choose (3).
Extending the lexer/parser will yield a declarative specification of what
exactly constitutes a GHC_OPTIONS pragma (albeit in a language that isn't
Haskell) and should be more efficient than `reads`, even if you fix it to
scale linearly. Plus, it seems that's what we do for other pragmas such as
RULE already.

That's my opinion anyway.
Cheers,
Sebastian

Am Di., 27. Apr. 2021 um 21:06 Uhr schrieb Richard Eisenberg <
rae at richarde.dev>:

> Hi devs,
>
> tl;dr: Is there any (efficient) way to get the String consumed by a
> `reads`?
>
> I'm stuck in thinking about a fix for #19746. Happily, the problem is
> simple enough that I could assign it in the first few weeks of a Haskell
> course... and yet I can't find a good solution! So I pose it here for
> inspiration.
>
> The high-level problem: Assign correct source spans to options within a
> OPTIONS_GHC pragma.
>
> Current approach: The payload of an OPTIONS_GHC pragma gets turned into a
> String and then processed by GHC.Utils.Misc.toArgs :: String -> Either
> String [String]. The result of toArgs is either an error string (the Left
> result) or a list of lexed options (the Right result).
>
> A little-known fact is that Haskell strings can be put in a OPTIONS_GHC
> pragma. So I can write both {-# OPTIONS_GHC -funbox-strict-fields #-} and
> {-# OPTIONS_GHC "-funbox-strict-fieds" #-}. Even stranger, I can write {-#
> OPTIONS_GHC ["-funbox-strict-fields"] #-}, where GHC will understand a list
> of strings. While I don't really understand the motivation for this last
> feature (I posted #19750 about this), the middle option, with the quotes,
> seems like it might be useful.
>
> Desired approach: change toArgs to have this type: RealSrcLoc -> String ->
> Either String [Located String], where the input RealSrcLoc is the location
> of the first character of the input String. Then, as toArgs processes the
> input, it advances the RealSrcLoc (with advanceSrcLoc), allowing us to
> create correct SrcSpans for each String.
>
> Annoying fact: Not all characters advance the source location by one
> character. Tabs and newlines don't. Perhaps some other characters don't,
> too.
>
> Central stumbling block: toArgs uses `reads` to parse strings. This makes
> great sense, because `reads` already knows how to convert Haskell String
> syntax into a proper String. The problem is that we have no idea what
> characters were consumed by `reads`. And, short of looking at the length of
> the remainder string in `reads` and comparing it to the length of the input
> string, there seems to be no way to recreate this lost information. Note
> that comparing lengths is slow, because we're dealing with Strings here.
> Once we know what was consumed by `reads`, then we can just repeatedly call
> advancedSrcLoc, and away we go.
>
> Ideas to get unblocked:
> 1. Just do the slow (quadratic in the number of options) thing, looking at
> the lengths of strings often.
> 2. Reimplement reading of strings to return both the result and the
> characters consumed
> 3. Incorporate the parsing of OPTIONS_GHC right into the lexer
>
> It boggles me that there isn't a better solution here. Do you see one?
>
> Thanks,
> Richard
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210427/dcbdd643/attachment.html>


More information about the ghc-devs mailing list