[Haskell-beginners] ICFP 2007
Daniel Fischer
daniel.is.fischer at web.de
Wed Aug 27 22:05:22 EDT 2008
Am Donnerstag, 28. August 2008 01:46 schrieb Rafael Gustavo da Cunha Pereira
Pinto:
> First of all, sorry asking too many questions!
That's what this list is for :)
>
> I am using the ICFP 2007 (http://save-endo.cs.uu.nl/) problem as a case for
> learning Haskell.
>
> I started with the prototype, listed below
>
>
> =================Main.hs=======================
> module Main(main) where
>
> import qualified Data.ByteString.Lazy.Char8 as L
> import qualified Data.ByteString.Char8 as S
>
> import Data.ByteString.Search.BoyerMoore as BM
> import Debug.Trace
>
> data DNA=DNA {prefix :: L.ByteString, suffix :: L.ByteString}
>
>
> main::IO ()
> main=do
> d<-L.readFile "endo.dna"
> let dna=DNA (L.pack "IIIIIIIIIIICCICCICCICFFIIIIIIIIIIIIIIIICFP") d
> let pat=S.pack "ICFP"
> print $ take 1 $ matchSL pat $ L.append (prefix dna) (suffix dna)
> =================EOF Main.hs===================
>
>
> My question is:
>
> Knowing that the searched pattern is in DNA prefix, will "suffix dna" ever
> be called here?
> I mean, will the "append" suspension ever be fully evaluated?
No. That's one nice thing about lazy ByteStrings. You only use as many chunks
as you really need.
L.append is lazy enough to survive even L.append (prefix dna) undefined.
Unfortunately our BM searching algorithm isn't, an optimisation for the case
of a lazy ByteString consisting of a single chunk requires that the second
argument of L.append is reduced to WHNF (KMP doesn't), so it needs to read
one chunk from the file. Hence in this case (suffix dna) is called, but only
one chunk of it is used.
You can verify this by running your programme with the option "+RTS -sstderr",
which will print out some time and allocation stats. I did:
dafis at linux:~/Documents/haskell/move> ghc -O2 --make endo
[1 of 1] Compiling Main ( endo.hs, endo.o )
Linking endo ...
dafis at linux:~/Documents/haskell/move> ./endo +RTS -sstderr
./endo +RTS -sstderr
[38]
89,388 bytes allocated in the heap
520 bytes copied during GC (scavenged)
0 bytes copied during GC (not scavenged)
57,344 bytes maximum residency (1 sample(s))
<snip 0.00 times>
Searching in L.append (prefix dna) (L.append (L.pack "u") (suffix dna)) gives
the figures
56,604 bytes allocated in the heap
528 bytes copied during GC (scavenged)
8 bytes copied during GC (not scavenged)
57,344 bytes maximum residency (1 sample(s))
, so then no chunk (~32K on my system) needs to be read from the file.
>
>
> I am thinking on using a data type like
>
> data DNA=DNA {prefix::L.ByteString, worked::L.ByteString,
> suffix::ByteString}
>
> The invariant is that worked should be L.empty whenever prefix is L.empty.
>
>
> Thanks in advance
Cheers,
Daniel
More information about the Beginners
mailing list