[Haskell-cafe] A round of golf

Thu Sep 18 18:09:18 EDT 2008

On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <dons at galois.com> wrote:
> wchogg:
>> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <dons at galois.com> wrote:
>> > wchogg:
>> >> Hey Haskell,
>> >> So for a fairly inane reason, I ended up taking a couple of minutes
>> >> and writing a program that would spit out, to the console, the number
>> >> of lines in a file.  Off the top of my head, I came up with this which
>> >> worked fine with files that had 100k lines:
>> >>
>> >> main = do
>> >>  path <- liftM head $ getArgs
>> >>  h <- openFile path ReadMode
>> >>  n <- execStateT (countLines h) 0
>> >>  print n
>> >>
>> >> untilM :: Monad m => (a -> m Bool) -> (a -> m ()) -> a -> m ()
>> >> untilM cond action val = do
>> >>  truthy <- cond val
>> >>  if truthy then return () else action val >> (untilM cond action val)
>> >>
>> >> countLines :: Handle -> StateT Int IO ()
>> >> countLines = untilM (\h -> lift $ hIsEOF h) (\h -> do
>> >>                                                 lift $ hGetLine h
>> >>                                                 modify (+1))
>> >>
>> >> If this makes anyone cringe or cry "you're doing it wrong", I'd
>> >> actually like to hear it.  I never really share my projects, so I
>> >> don't know how idiosyncratic my style is.
>> >
>> > This makes me cry.
>> >
>> >    import System.Environment
>> >    import qualified Data.ByteString.Lazy.Char8 as B
>> >
>> >    main = do
>> >        [f] <- getArgs
>> >        s   <- B.readFile f
>> >        print (B.count '\n' s)
>> >
>> > Compile it.
>> >
>> >    $ ghc -O2 --make A.hs
>> >
>> >    $ time ./A /usr/share/dict/words
>> >    52848
>> >    ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
>> >
>> > Against standard tools:
>> >
>> >    $ time wc -l /usr/share/dict/words
>> >    52848 /usr/share/dict/words
>> >    wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
>>
>> So both you & Bryan do essentially the same thing and of course both
>> versions are far better than mine.  So the purpose of using the Lazy
>> version of ByteString was so that the file is only incrementally
>> loaded by readFile as count is processing?
>
> Yep, that's right
>
> The streaming nature is implicit in the lazy bytestring. It's kind of
> the dual of explicit chunkwise control -- chunk processing reified into
> the data structure.

Hi Don,
I have a bit more of a followup, actually.  You make use of the built
in bytestring consumer count, which itself is built upon the
foldlChunks function which is only exported in the
ByteString.Lazy.Internal.  If I want to make my own efficient
bytestring consumer, is that what I need to use in order to preserve
the inherent laziness of the datastructure?

Also, I feel a little at a loss for how to make a good bytestring
producer for efficiently _writing_ large swaths of data via writeFile.
 Would it be possible to whip up a small example?

Oh, and lastly, I apologize to both you & Bryan for making you cry.  I
hope you can forgive my cruelty.

Thanks,
Creighton