[Haskell-cafe] A round of golf
wchogg at gmail.com
Thu Sep 18 18:09:18 EDT 2008
On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <dons at galois.com> wrote:
>> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <dons at galois.com> wrote:
>> > wchogg:
>> >> Hey Haskell,
>> >> So for a fairly inane reason, I ended up taking a couple of minutes
>> >> and writing a program that would spit out, to the console, the number
>> >> of lines in a file. Off the top of my head, I came up with this which
>> >> worked fine with files that had 100k lines:
>> >> main = do
>> >> path <- liftM head $ getArgs
>> >> h <- openFile path ReadMode
>> >> n <- execStateT (countLines h) 0
>> >> print n
>> >> untilM :: Monad m => (a -> m Bool) -> (a -> m ()) -> a -> m ()
>> >> untilM cond action val = do
>> >> truthy <- cond val
>> >> if truthy then return () else action val >> (untilM cond action val)
>> >> countLines :: Handle -> StateT Int IO ()
>> >> countLines = untilM (\h -> lift $ hIsEOF h) (\h -> do
>> >> lift $ hGetLine h
>> >> modify (+1))
>> >> If this makes anyone cringe or cry "you're doing it wrong", I'd
>> >> actually like to hear it. I never really share my projects, so I
>> >> don't know how idiosyncratic my style is.
>> > This makes me cry.
>> > import System.Environment
>> > import qualified Data.ByteString.Lazy.Char8 as B
>> > main = do
>> > [f] <- getArgs
>> > s <- B.readFile f
>> > print (B.count '\n' s)
>> > Compile it.
>> > $ ghc -O2 --make A.hs
>> > $ time ./A /usr/share/dict/words
>> > 52848
>> > ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
>> > Against standard tools:
>> > $ time wc -l /usr/share/dict/words
>> > 52848 /usr/share/dict/words
>> > wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
>> So both you & Bryan do essentially the same thing and of course both
>> versions are far better than mine. So the purpose of using the Lazy
>> version of ByteString was so that the file is only incrementally
>> loaded by readFile as count is processing?
> Yep, that's right
> The streaming nature is implicit in the lazy bytestring. It's kind of
> the dual of explicit chunkwise control -- chunk processing reified into
> the data structure.
I have a bit more of a followup, actually. You make use of the built
in bytestring consumer count, which itself is built upon the
foldlChunks function which is only exported in the
ByteString.Lazy.Internal. If I want to make my own efficient
bytestring consumer, is that what I need to use in order to preserve
the inherent laziness of the datastructure?
Also, I feel a little at a loss for how to make a good bytestring
producer for efficiently _writing_ large swaths of data via writeFile.
Would it be possible to whip up a small example?
Oh, and lastly, I apologize to both you & Bryan for making you cry. I
hope you can forgive my cruelty.
More information about the Haskell-Cafe