[Haskell-cafe] Optimising ReadP
Andrew Kvapil
viluon at seznam.cz
Tue Sep 21 12:59:20 UTC 2021
Hi,
I'm trying to optimise a short Haskell program (inspired by Rust's
performance). From simple profiling, I discovered that 90+% of time and
allocations is spent in a `ReadP`-based parser. The parser is fairly
small, I assumed GHC would inline/optimise most of it. I tried adding
some `SPECIALISE` pragmas but found out those don't work: they require
`INLINE`/`INLINEABLE` pragmas or compiling with `-O`. I'm confused by
the latter requirement as I thought dependencies of the Stack project
are compiled with optimisation enabled. I copied the entirety of the
`ReadP` module into my project along with what I used from
`Text.Read.Lex` to be able to optimise this code locally. I managed to
get some speedups by specialising the integer parsers. Cloning the
`ReadP` code also helped with more detailed profiling reports, I now
know that the majority of time and allocations are spent in `>>=`.
That's it for the long-winded intro, here are my questions:
- what's the problem with GHC complaining about missing `INLINE(ABLE)`
pragmas and/or `-O` in Stack dependencies?
- Are these things just never inlined unless library authors
specifically mark them as such?
- What profiling tools do people usually use in situations like this?
- I cannot specialise `>>=` even when I mark it `INLINEABLE` in the
`Monad ReadP` instance, apparently, the pragma should be placed at the
declaration site (at least that's what the warning says). So I run into
the same issue with `INLINE(ABLE)`/`-O` as above. That seems pretty
silly, I feel like I should be able to inline/specialise this particular
instance and not worry about the global declaration of `>>=`. Is there a
workaround for this?
- I also used `BangPatterns` in my optimisations, but it's been
hit-and-miss. They (surprisingly) helped in small, local definitions
marked `INLINE` (I assumed strictness analysis would pick up on those),
but in other places, such as `ReadP`'s `>>=`, strictness annotations
slightly worsened performance. How do people look for good patterns to
annotate as strict?
- How do I find out to what extent is laziness slowing things down, or
look for places with unintended thunks?
As these are general optimisation questions, I'm not looking for parsing
alternatives suggestions. I'd like to write high-level code and have it
efficiently compose, this is a toy program anyway and I'm wondering how
far can I get with such a simple parsing library. I suspect the majority
of allocations coming from the parser are short-lived garbage that I
should be able to avoid with optimisations and maybe a little bit of
refactoring.
One thing that's specific to my implementation is that the parser first
produces a `[(Int, Int)]` and then converts that to `IntMap (Set Int)`
with a `foldl'`. I would like to ensure the list is never built in the
first place, is there a way to do that?
As to the code itself, it's only 90 lines, but I'm not sure what the
consensus is on attachments in this mailing list. I can provide the
whole Stack project along with a test case and a reference solution in
Rust upon request. (I'd post it on my GitHub, but there's a slight
chance a participant in the competition this is a reference solution to
may find it there.)
Best regards,
Andrew
More information about the Haskell-Cafe
mailing list