[Haskell-cafe] A round of golf
Don Stewart
dons at galois.com
Thu Sep 18 16:38:42 EDT 2008
dagit:
> On Thu, Sep 18, 2008 at 12:31 PM, Creighton Hogg <wchogg at gmail.com> wrote:
> > On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <dons at galois.com> wrote:
> >> wchogg:
> >>> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <dons at galois.com> wrote:
> > <snip>
> >>> > This makes me cry.
> >>> >
> >>> > import System.Environment
> >>> > import qualified Data.ByteString.Lazy.Char8 as B
> >>> >
> >>> > main = do
> >>> > [f] <- getArgs
> >>> > s <- B.readFile f
> >>> > print (B.count '\n' s)
> >>> >
> >>> > Compile it.
> >>> >
> >>> > $ ghc -O2 --make A.hs
> >>> >
> >>> > $ time ./A /usr/share/dict/words
> >>> > 52848
> >>> > ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
> >>> >
> >>> > Against standard tools:
> >>> >
> >>> > $ time wc -l /usr/share/dict/words
> >>> > 52848 /usr/share/dict/words
> >>> > wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
> >>>
> >>> So both you & Bryan do essentially the same thing and of course both
> >>> versions are far better than mine. So the purpose of using the Lazy
> >>> version of ByteString was so that the file is only incrementally
> >>> loaded by readFile as count is processing?
> >>
> >> Yep, that's right
> >>
> >> The streaming nature is implicit in the lazy bytestring. It's kind of
> >> the dual of explicit chunkwise control -- chunk processing reified into
> >> the data structure.
> >
> > To ask an overly general question, if lazy bytestring makes a nice
> > provider for incremental processing are there reasons to _not_ reach
> > for that as my default when processing large files?
>
> Yes. The main time is when you "accidentally" force the whole file
> (or at least large parts of it) into memory at the same time.
> Profiling and careful programming seem to be the workarounds, but in a
> large application the "careful programming" part can become
> prohibitively expensive. This is due to the sometimes subtle nature
> of how strictness composes with laziness. This is a the result of a
> more general issue that it is non-obvious how your program is
> evaluated at run-time thanks to lazy evaluation, thus making lazy
> evaluation act as a double edged sword at times. I'm not saying get
> rid of lazy eval, but occasionally it presents problems for efficiency
> and diagnosing efficiency problems.
>
> The rule seems to be: Write correct code first, fix the problems
> (usually just inefficiencies) later.
>
> Using lazy bytestrings makes it easier to write concise code that is
> more easily inspected for correctness. Perhaps it is even easier to
> test such code, but I'm skeptical of that. Thus, I think most people
> here would agree that reaching first for lazy byte string is preferred
> over other techniques. Plus, the one of the most common fixes to
> inefficient haskell programs is to make them lazy in the right places
> and strict in key places and using lazy bytestring will get you part
> of the way to that refactoring usually.
Work on the "dual" of lazy bytestrings -- chunked enumerators -- may
lead to more options in this area.
The question of compositionality of left-fold enumerators remains
(afaik), but we'll see.
-- Don
More information about the Haskell-Cafe
mailing list