[Haskell-cafe] Binary Data Access via PIC…??

Sun Jan 13 20:02:59 UTC 2019

Shameless plug for one of my own libraries, which seems at least
relevant to the problem space:

    https://hackage.haskell.org/package/capnp

Though as a disclaimer I haven't done any benchmarking myself; my
personal interest is more in RPC than in super-fast serialization.
There will be a release with RPC support sometime later this month.

That said, I have heard from one user who's using it to communicate with
a part of their application written in C++, who switched over to from
protobufs for perf, and because they needed to handle very large (>
2GiB) data.

-Ian

Quoting Nick Rudnick (2019-01-13 07:43:40)
>    On NL FP day, it struck me again when I saw an almost 1 MB *.hs file
>    with apparent sole purpose of getting a quantity of raw data
>    incorporated to the binary � applying some funny text encoding
>    constructs. I remembered that, to my best knowledge, with major
>    downside that it's compile time, this appears to be the best solution
>    to me�
>    Another approach I did notice several times was, say, the use of super
>    fast parsing, to read in binary data at run time.
>    Did I miss something?
>    Or, more specifically � I am speaking about that kind of binary data
>    which is
>    (1) huge! � the 1 MB mentioned above rather being at the lower limit,
>    (2) completely independent from the version of the Haskell compiler,
>    (3) guaranteed (externally!) to match the structural requirements of
>    the application referred to,
>    (4) well managed in some way, concerning ABI issues, too (e.g.
>    versioning, metadata headers etc.),
>    and the question is in how far � as I believe other languages do, too �
>    we can exploit PIC (position independent code), to read in really large
>    quantities of binary data at run time or immediately before run time,
>    without the need for parsing at all.
>    E.g., a textual data representation Haskell file will generate an an
>    object file already, for which linking only should have a limited
>    amount of assumptions regarding its inner structure. Imagining I have a
>    huge but simple DB table, and a kind of converter which by some
>    simplification of a Haskell compiler generates an object file that
>    equally matches these (limited, as I believe) assumptions, and at the
>    end can build a 'fake' the linker accepts instead of one dummy file
>    skeleton � couldn't that be a way leading into the direction of
>    directly getting in vast amounts of binary data in one part?
>    In case there are stronger integrity needs, extra metadata like should
>    be usable for verification of the origin from a valid code generator.
>    Of course, while not completely necessary, true run time loading would
>    be even greater� while direct interfacing to foreign (albeit simple)
>    memory spaces deems much more intricate to me.
>    I regularly stumbled about such cases � so I do believe this to useful.
>    I would be happy to learn more about this � any thoughts�??
>    Cheers, and all the best, Nick