[Haskell-cafe] Help with Stack Space Overflow / Memory Issues

Travis B. Hartwell nafai at travishartwell.net
Sun Feb 15 17:04:35 EST 2009


Hello,

I'm writing a small program to process Delicious [1] RSS feeds.  I like
look at the recent feeds to see what others have bookmarked recently.
But, there are a lot of duplicates in the recent feeds as an entry is
shown for each person who bookmarks an individual URL.  I decided to
write a small program that would trim out those that I've seen before.

I wrote a small program that read a feed (initially just a on-disk copy
of an RSS feed) and removed the duplicate items just within that feed.
It worked great.  Then, I wanted to add persistence, so this would
maintain state from one run to the next.  I decided to use Data.Binary
to serialize the Data.Map I was using and re-load it each time.
Unfortunately, making this change caused a "Stack Space Overflow" error
and I couldn't track down what was wrong.  This was with GHC 6.8.2.  I
recently upgraded to GHC 6.10.1 and the memory just grows unbounded,
until it actually locks up my machine.

This happens even when I comment out the code for the serialization /
de-serialization of the map, so essentially the only difference from my
prior version is the function where the map is initialized returns IO
[Item] instead of [Item].

The latest version of my code is up on github [2], and the sample RSS
feed I was processing is included in the repo.  I'd appreciate some help
in how to attack this problem.  I've even tried profiling this (back
when I was using 6.8.2) and there was nothing enlightening there, at
least with my limited Haskell experience.  I am unsure of how to get
this to work, or if the problem is even my code.

Additionally, I am unsure if my serialization code would work anyway.
Because Haskell is not pass-by-reference, would the changes to the
seenMap propogate back to my deDupWithSerializedMap function where it
is serialized?  If not, how would I go about doing this?

I think part of my problem might be the difference between pure and
impure code and how to separate it.

Thanks for the help!

[1] http://www.delicious.com/
[2] http://github.com/Nafai77/recent-feeds/tree/master

---------------
Travis B. Hartwell
Software Toolsmith

Blog:
http://www.travishartwell.net/blog

Where to find me:
http://www.travishartwell.net/blog/static/where_to_find_me



More information about the Haskell-Cafe mailing list