[Haskell-cafe] Lazy vs correct IO [Was: A round of golf]

Don Stewart dons at galois.com
Fri Sep 19 03:13:40 EDT 2008


oleg:
> Given the stance against top-level mutable variables, I have not
> expected to see this Lazy IO code. After all, what could be more against
> the spirit of Haskell than a `pure' function with observable side
> effects. With Lazy IO, one indeed has to choose between correctness
> and performance. The appearance of such code is especially strange
> after the evidence of deadlocks with Lazy IO, presented on this list
> less than a month ago. Let alone unpredictable resource usage and
> reliance on finalizers to close files (forgetting that GHC does not
> guarantee that finalizers will be run at all).
> 
> Is there an alternative?

Hi Oleg!

I'm glad you joined the thread at this point.

Some background: our best solutions for this problem using lazy IO, are
based on chunk-wise lazy data structures, typically lazy bytestrings.
Often we'll write programs like:

    import qualified Data.ByteString.Lazy.Char8 as B
    import System.Environment

    main = do
        [f] <- getArgs
        s   <- B.readFile f
        print (B.count '\n' s)

Which are nicely efficient

    $ ghc -O2 A.hs --make
    $ du -hs data
    100M data

    $ time ./A data 
    11078540
    ./A data  0.17s user 0.04s system 100% cpu 0.210 total

And we know from elsewhere the performance is highlycompetitive:

    http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcol&lang=all

Now, enumerators are very promising, and there's a lot of interest at
the moment, (e.g. just this week, Johan Tibell gave an inspiring talk at
Galois about this approach to IO, http://www.galois.com/blog/2008/09/12/left-fold-enumerators-a-safe-expressive-and-efficient-io-interface-for-haskell/
and we spent the day sketching out an enumerator bytestring design, 

But there are some open questions. Perhaps you have some answers?

    * Can we write a Data.ByteString.Enumerator that has matching or
      better performance than its "dual", the existing chunk-wise lazy
      stream type?

    * Is there a translation from 

        data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString

      and functions on this type,

        foldlChunks :: (a -> S.ByteString -> a) -> a -> ByteString -> a
        foldlChunks f z = go z
          where
                go !a Empty        = a
                go !a (Chunk c cs) = go (f a c) cs

      to an enumerator implementation?

    * Can we compose enumerators as we can stream functions?

    * Can we do fusion on enumerators? Does that make composition easier?
        (Indeed, is there an encoding of enumerators analogous to stream
        fusion control?)

Any thoughts?

-- Don


More information about the Haskell-Cafe mailing list