[Haskell-cafe] Data.Binary stack overflow with Data.Sequence String

Thu Mar 5 06:51:35 EST 2009

Hi Gwern,

I get String/Data.Binary issues too. My suggestion would be to change
your strings to ByteString's, serisalise, and then do the reverse
conversion when reading. Interestingly, a String and a ByteString have
identical Data.Binary reps, but in my experiments converting,
including the cost of BS.unpack, makes the reading substantially
cheaper.

Thanks

Neil

On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen <gwern0 at gmail.com> wrote:
> On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
> <spencerjanssen at gmail.com> wrote:
>> On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen <gwern0 at gmail.com> wrote:
>>> So recently I've been having issues with Data.Binary & Data.Sequence;
>>> I serialize a 'Seq String'
>>>
>>> You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
>>>
>>> The relevant function seems to be:
>>>
>>> -- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'.
>>> readDB :: YiM ArticleDB
>>> readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty)
>>>          where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x
>>>                -- We read in with strict bytestrings to guarantee the
>>> file is closed,
>>>                -- and then we convert it to the lazy bytestring
>>> data.binary expects.
>>>                -- This is inefficient, but alas...
>>>
>>> My current serialized file is about 9.4M. I originally thought that
>>> the issue might be the recent upgrade in Yi to binary 0.5, but I
>>> unpulled patches back to past that, and the problem still manifested.
>>>
>>> Whenever yi tries to read the articles.db file, it stack overflows. It
>>> actually stack-overflowed on even smaller files, but I managed to bump
>>> the size upwards, it seems, by the strict-Bytestring trick.
>>> Unfortunately, my personal file has since passed whatever that limit
>>> was.
>>>
>>> I've read carefully the previous threads on Data.Binary and Data.Map
>>> stack-overflows, but none of them seem to help; hacking some $!s or
>>> seqs into readDB seems to make no difference, and Seq is supposed to
>>> be a strict datastructure already! Doing things in GHCi has been
>>> tedious, and hasn't enlightened me much: sometimes things overflow and
>>> sometimes they don't. It's all very frustrating and I'm seriously
>>> considering going back to using the original read/show code unless
>>> anyone knows how to fix this - that approach may be many times slower,
>>> but I know it will work.
>>>
>>> --
>>> gwern
>>
>> Have you tried the darcs version of binary?  It has a new instance
>> which looks more efficient than the old.
>>
>>
>> Cheers,
>> Spencer Janssen
>
> I have. It still stack-overflows on my 9.8 meg file. (The magic number
> seems to be somewhere between 9 and 10 megabytes.)
>
> --
> gwern
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>