[Haskell-cafe] Data.Binary stack overflow with Data.Sequence String

Thu Mar 5 14:35:43 EST 2009

Avoid unpack!

ndmitchell:
> Hi Gwern,
> 
> I get String/Data.Binary issues too. My suggestion would be to change
> your strings to ByteString's, serisalise, and then do the reverse
> conversion when reading. Interestingly, a String and a ByteString have
> identical Data.Binary reps, but in my experiments converting,
> including the cost of BS.unpack, makes the reading substantially
> cheaper.
> 
> Thanks
> 
> Neil
> 
> On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
> > On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
> > <spencerjanssen at gmail.com> wrote:
> >> On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen <gwern0 at gmail.com> wrote:
> >>> So recently I've been having issues with Data.Binary & Data.Sequence;
> >>> I serialize a 'Seq String'
> >>>
> >>> You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
> >>>
> >>> The relevant function seems to be:
> >>>
> >>> -- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'.
> >>> readDB :: YiM ArticleDB
> >>> readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty)
> >>>          where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x
> >>>                -- We read in with strict bytestrings to guarantee the
> >>> file is closed,
> >>>                -- and then we convert it to the lazy bytestring
> >>> data.binary expects.
> >>>                -- This is inefficient, but alas...
> >>>
> >>> My current serialized file is about 9.4M. I originally thought that
> >>> the issue might be the recent upgrade in Yi to binary 0.5, but I
> >>> unpulled patches back to past that, and the problem still manifested.
> >>>
> >>> Whenever yi tries to read the articles.db file, it stack overflows. It
> >>> actually stack-overflowed on even smaller files, but I managed to bump
> >>> the size upwards, it seems, by the strict-Bytestring trick.
> >>> Unfortunately, my personal file has since passed whatever that limit
> >>> was.
> >>>
> >>> I've read carefully the previous threads on Data.Binary and Data.Map
> >>> stack-overflows, but none of them seem to help; hacking some $!s or
> >>> seqs into readDB seems to make no difference, and Seq is supposed to
> >>> be a strict datastructure already! Doing things in GHCi has been
> >>> tedious, and hasn't enlightened me much: sometimes things overflow and
> >>> sometimes they don't. It's all very frustrating and I'm seriously
> >>> considering going back to using the original read/show code unless
> >>> anyone knows how to fix this - that approach may be many times slower,
> >>> but I know it will work.
> >>>
> >>> --
> >>> gwern
> >>
> >> Have you tried the darcs version of binary?  It has a new instance
> >> which looks more efficient than the old.
> >>
> >>
> >> Cheers,
> >> Spencer Janssen
> >
> > I have. It still stack-overflows on my 9.8 meg file. (The magic number
> > seems to be somewhere between 9 and 10 megabytes.)
> >
> > --
> > gwern
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>