[Haskell-cafe] Data.Binary stack overflow with Data.Sequence String

Thu Mar 5 14:55:46 EST 2009

Avoid massive reductions in runtime while maintaining the same API?

I did move to using ByteString's internally for those bits later on,
but reading String's from Data.Binary with a ByteString+unpack went
much more quickly than reading String's

On Thu, Mar 5, 2009 at 7:35 PM, Don Stewart <dons at galois.com> wrote:
> Avoid unpack!
>
> ndmitchell:
>> Hi Gwern,
>>
>> I get String/Data.Binary issues too. My suggestion would be to change
>> your strings to ByteString's, serisalise, and then do the reverse
>> conversion when reading. Interestingly, a String and a ByteString have
>> identical Data.Binary reps, but in my experiments converting,
>> including the cost of BS.unpack, makes the reading substantially
>> cheaper.
>>
>> Thanks
>>
>> Neil
>>
>> On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
>> > On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
>> > <spencerjanssen at gmail.com> wrote:
>> >> On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen <gwern0 at gmail.com> wrote:
>> >>> So recently I've been having issues with Data.Binary & Data.Sequence;
>> >>> I serialize a 'Seq String'
>> >>>
>> >>> You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
>> >>>
>> >>> The relevant function seems to be:
>> >>>
>> >>> -- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'.
>> >>> readDB :: YiM ArticleDB
>> >>> readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty)
>> >>>          where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x
>> >>>                -- We read in with strict bytestrings to guarantee the
>> >>> file is closed,
>> >>>                -- and then we convert it to the lazy bytestring
>> >>> data.binary expects.
>> >>>                -- This is inefficient, but alas...
>> >>>
>> >>> My current serialized file is about 9.4M. I originally thought that
>> >>> the issue might be the recent upgrade in Yi to binary 0.5, but I
>> >>> unpulled patches back to past that, and the problem still manifested.
>> >>>
>> >>> Whenever yi tries to read the articles.db file, it stack overflows. It
>> >>> actually stack-overflowed on even smaller files, but I managed to bump
>> >>> the size upwards, it seems, by the strict-Bytestring trick.
>> >>> Unfortunately, my personal file has since passed whatever that limit
>> >>> was.
>> >>>
>> >>> I've read carefully the previous threads on Data.Binary and Data.Map
>> >>> stack-overflows, but none of them seem to help; hacking some $!s or
>> >>> seqs into readDB seems to make no difference, and Seq is supposed to
>> >>> be a strict datastructure already! Doing things in GHCi has been
>> >>> tedious, and hasn't enlightened me much: sometimes things overflow and
>> >>> sometimes they don't. It's all very frustrating and I'm seriously
>> >>> considering going back to using the original read/show code unless
>> >>> anyone knows how to fix this - that approach may be many times slower,
>> >>> but I know it will work.
>> >>>
>> >>> --
>> >>> gwern
>> >>
>> >> Have you tried the darcs version of binary?  It has a new instance
>> >> which looks more efficient than the old.
>> >>
>> >>
>> >> Cheers,
>> >> Spencer Janssen
>> >
>> > I have. It still stack-overflows on my 9.8 meg file. (The magic number
>> > seems to be somewhere between 9 and 10 megabytes.)
>> >
>> > --
>> > gwern
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe at haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> >
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>