[Haskell-cafe] containers and maps

Edward Kmett ekmett at gmail.com
Thu Aug 13 22:15:42 EDT 2009


Yeah, my answer on the ByteString front is to make a bunch of Reducers for a
Builder data type that works like the one in Data.Binary.Builder. It
generates a much more suitable Reducer and can reduce Chars, Strings, Strict
and Lazy UTF8-encoded Bytestrings, etc. I'm using that inside of my toy
compiler right now when I need to generate output, like a Lazy ByteString of
source. (My variant Builder -- ok, my mostly cut and pasted Builder, has
been made slightly smarter and is now tuned to work with UTF8 encoded data,
which is what I've been feeding it lately.) I may migrate it into the
monoids library from my current project if there is interest, but I'm trying
to avoid ballooning that up too far in size again and re-entering dependency
hell.
The monoids library drew a pretty hard distinction between things that build
up nicely (Monoids/Reducers) and things that tear down easily (Generators)
but only works with a few things that do both efficiently (i.e.
Seq/FingerTree) For my purposes this has worked rather well, but raw strict
(and to a lesser degree, lazy) ByteStrings make abysmal Reducers. ;)

In the end, bolting a list-like interface on something that can be _made_ to
implement all of the operations but can't be made to implement them remotely
efficiently, seems like it is asking for trouble, bug reports about
performance, and pain.

On the other hand, FingerTrees of wrapped strict ByteStrings work nicely as
a monoidal lazy bytestring-like structure that provides an efficient snoc,
indexing operation, and very efficient slicing for extracting token values
with maximal sharing. I'll be giving a talk at the next Boston Haskell User
Group in a few days about my abuse of those in a terrible hodge-podge
solution of Iteratees, Parsec 3 and monoids, which yields an efficient
parallel/incremental lexer.

I'll see about posting the slides afterwards.

-Edward Kmett

On Thu, Aug 13, 2009 at 4:43 PM, John Lato <jwlato at gmail.com> wrote:

> On Thu, Aug 13, 2009 at 1:51 PM, Jake McArthur<jake.mcarthur at gmail.com>
> wrote:
> > John Lato wrote:
> >>
> >
> >> This might work with UVector (I intend to try it this evening); I
> >> don't know how well the fusion framework will hold up in class
> >> dictionaries.
> >
> > Do report back, as I am curious as well.
>
> I have just recently hacked together a small test.  The code is at
> http://inmachina.net/~jwlato/haskell/testUVector.hs
>
> The task is to generate a list of 1 million Ints (really 1e6-1), map a
> multiply by 2, and check to see if any values equal 0.  The uvector
> code is (essentially):
>
> > let res = anyU (== ( 0:: Int)) . mapU (* 2) . enumFromToU 1 $ 1000000
>
> and by itself runs in a small fraction of a second.  Technically the
> start and stop values are specified on the command line, but these are
> the values I predominantly used.
>
> Here are results for some other modes:
>
> ListLike.map: significantly slower, and runs out of memory (memory
> request failed)
> ListLike.rigidMap: appears to be the same as using mapU directly
> mapReduce:  significantly slower (the 1e6 element test completed after
> about an hour), but ran in a very small amount of memory.
>
> It looks like GHC was able to see through the rigidMap function at
> least, but I don't know if this would still work with the instance
> defined in another file.
>
> Cheers,
> John
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090813/42aa336c/attachment.html


More information about the Haskell-Cafe mailing list