simple Haskell help needed on #19746

Iavor Diatchki iavor.diatchki at gmail.com
Tue Apr 27 20:28:52 UTC 2021


Hi Richard,

perhaps something like this would work:

```Haskell
import Text.ParserCombinators.ReadP(readP_to_S, gather)
import qualified Text.Read.Lex as L

example :: ReadS (Int,String)
example input =
  do ((xs,L.String t), rest) <- readP_to_S (gather L.lex) input
     pure ((length xs, t), rest)
```

-Iavor

On Tue, Apr 27, 2021 at 12:05 PM Richard Eisenberg <rae at richarde.dev> wrote:

> Hi devs,
>
> tl;dr: Is there any (efficient) way to get the String consumed by a
> `reads`?
>
> I'm stuck in thinking about a fix for #19746. Happily, the problem is
> simple enough that I could assign it in the first few weeks of a Haskell
> course... and yet I can't find a good solution! So I pose it here for
> inspiration.
>
> The high-level problem: Assign correct source spans to options within a
> OPTIONS_GHC pragma.
>
> Current approach: The payload of an OPTIONS_GHC pragma gets turned into a
> String and then processed by GHC.Utils.Misc.toArgs :: String -> Either
> String [String]. The result of toArgs is either an error string (the Left
> result) or a list of lexed options (the Right result).
>
> A little-known fact is that Haskell strings can be put in a OPTIONS_GHC
> pragma. So I can write both {-# OPTIONS_GHC -funbox-strict-fields #-} and
> {-# OPTIONS_GHC "-funbox-strict-fieds" #-}. Even stranger, I can write {-#
> OPTIONS_GHC ["-funbox-strict-fields"] #-}, where GHC will understand a list
> of strings. While I don't really understand the motivation for this last
> feature (I posted #19750 about this), the middle option, with the quotes,
> seems like it might be useful.
>
> Desired approach: change toArgs to have this type: RealSrcLoc -> String ->
> Either String [Located String], where the input RealSrcLoc is the location
> of the first character of the input String. Then, as toArgs processes the
> input, it advances the RealSrcLoc (with advanceSrcLoc), allowing us to
> create correct SrcSpans for each String.
>
> Annoying fact: Not all characters advance the source location by one
> character. Tabs and newlines don't. Perhaps some other characters don't,
> too.
>
> Central stumbling block: toArgs uses `reads` to parse strings. This makes
> great sense, because `reads` already knows how to convert Haskell String
> syntax into a proper String. The problem is that we have no idea what
> characters were consumed by `reads`. And, short of looking at the length of
> the remainder string in `reads` and comparing it to the length of the input
> string, there seems to be no way to recreate this lost information. Note
> that comparing lengths is slow, because we're dealing with Strings here.
> Once we know what was consumed by `reads`, then we can just repeatedly call
> advancedSrcLoc, and away we go.
>
> Ideas to get unblocked:
> 1. Just do the slow (quadratic in the number of options) thing, looking at
> the lengths of strings often.
> 2. Reimplement reading of strings to return both the result and the
> characters consumed
> 3. Incorporate the parsing of OPTIONS_GHC right into the lexer
>
> It boggles me that there isn't a better solution here. Do you see one?
>
> Thanks,
> Richard
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210427/781e6355/attachment.html>


More information about the ghc-devs mailing list