[Haskell-cafe] A round of golf

Thu Sep 18 16:38:42 EDT 2008

dagit:
> On Thu, Sep 18, 2008 at 12:31 PM, Creighton Hogg <wchogg at gmail.com> wrote:
> > On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <dons at galois.com> wrote:
> >> wchogg:
> >>> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <dons at galois.com> wrote:
> > <snip>
> >>> > This makes me cry.
> >>> >
> >>> >    import System.Environment
> >>> >    import qualified Data.ByteString.Lazy.Char8 as B
> >>> >
> >>> >    main = do
> >>> >        [f] <- getArgs
> >>> >        s   <- B.readFile f
> >>> >        print (B.count '\n' s)
> >>> >
> >>> > Compile it.
> >>> >
> >>> >    $ ghc -O2 --make A.hs
> >>> >
> >>> >    $ time ./A /usr/share/dict/words
> >>> >    52848
> >>> >    ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
> >>> >
> >>> > Against standard tools:
> >>> >
> >>> >    $ time wc -l /usr/share/dict/words
> >>> >    52848 /usr/share/dict/words
> >>> >    wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
> >>>
> >>> So both you & Bryan do essentially the same thing and of course both
> >>> versions are far better than mine.  So the purpose of using the Lazy
> >>> version of ByteString was so that the file is only incrementally
> >>> loaded by readFile as count is processing?
> >>
> >> Yep, that's right
> >>
> >> The streaming nature is implicit in the lazy bytestring. It's kind of
> >> the dual of explicit chunkwise control -- chunk processing reified into
> >> the data structure.
> >
> > To ask an overly general question, if lazy bytestring makes a nice
> > provider for incremental processing are there reasons to _not_ reach
> > for that as my default when processing large files?
> 
> Yes.  The main time is when you "accidentally" force the whole file
> (or at least large parts of it) into memory at the same time.
> Profiling and careful programming seem to be the workarounds, but in a
> large application the "careful programming" part can become
> prohibitively expensive.  This is due to the sometimes subtle nature
> of how strictness composes with laziness.  This is a the result of a
> more general issue that it is non-obvious how your program is
> evaluated at run-time thanks to lazy evaluation, thus making lazy
> evaluation act as a double edged sword at times.  I'm not saying get
> rid of lazy eval, but occasionally it presents problems for efficiency
> and diagnosing efficiency problems.
> 
> The rule seems to be:  Write correct code first, fix the problems
> (usually just inefficiencies) later.
> 
> Using lazy bytestrings makes it easier to write concise code that is
> more easily inspected for correctness.  Perhaps it is even easier to
> test such code, but I'm skeptical of that.  Thus, I think most people
> here would agree that reaching first for lazy byte string is preferred
> over other techniques.  Plus, the one of the most common fixes to
> inefficient haskell programs is to make them lazy in the right places
> and strict in key places and using lazy bytestring will get you part
> of the way to that refactoring usually.

Work on the "dual" of lazy bytestrings -- chunked enumerators -- may
lead to more options in this area. 

The question of compositionality of left-fold enumerators remains
(afaik), but we'll see. 

-- Don