[Haskell-cafe] Trying to reduce memory costs of String duplicates - Oh my god it works

Günther Schmidt gue.schmidt at web.de
Sun Sep 6 07:06:24 EDT 2009


Hi Luke, Brian,

oh my god it works!

I'm just saying it because that's a first!

Late last night I had already suspected that laziness might be one of the  
reasons why the memoization technique showed no effect on memory  
consumption and after Brian's email that's exactly where I tried again,  
using Bang Patterns.


IT WORKED!

The memory consumption is now down by 20 fold!

Thank you guys very very much!

Günther



Am 06.09.2009, 03:24 Uhr, schrieb Brian Sniffen <bts at evenmere.org>:

> Remember that each string will still be read, so you may still
> allocate a great deal of memory and generate a great deal of garbage.
> You will just release it before long, rather than hold it for the life
> of your program.  Remember also that the memoization itself is lazy:
> it does no good if you read all the file in, then force the
> memoizations along with the computation of interest.
>
> -Brian
>
> 2009/9/5 Günther Schmidt <gue.schmidt at web.de>:
>> Hi Luke,
>>
>> thanks, this is some very good advice as I find many duplicates in the  
>> data
>> I have to iterate over.
>>
>> However so far I'm unable to tell wether this actually works or not, I  
>> tried
>> it a couple of times under different settings but it showed to  
>> difference in
>> memory consumption. The same mem peeks as before.
>>
>> Do you have some code that where you could see a before and after?
>>
>> Günther
>>
>>
>> Am 05.09.2009, 19:38 Uhr, schrieb Luke Palmer <lrpalmer at gmail.com>:
>>
>>> 2009/9/5 Günther Schmidt <gue.schmidt at web.de>:
>>>>
>>>> Hi all,
>>>>
>>>> I'm reading in a data of 216k records into a map of Key, Values Pairs,
>>>> the
>>>> values being strings.
>>>>
>>>> As it happens out of 216k String values there really are only about  
>>>> 6.6k
>>>> distinct string values, so I could save a lot of RAM if I was able to
>>>> "insert" only actually *new* string values into the map and use
>>>> references
>>>> to (string) values that already are in memory instead.
>>>>
>>>> Is there a container that would, if I wanted to insert an element,  
>>>> return
>>>> a
>>>> pair of either the previously inserted, equal value and the container
>>>> unchanged, or the new, previously unknown value and the new container
>>>> amended by that element?
>>>
>>> I believe a memoization of the identity function will do what you want:
>>>
>>> import qualified Data.MemoCombinators as Memo
>>>
>>> share = Memo.list Memo.char id
>>>
>>> Then pass any string through share to make/get a cached version.
>>>
>>> You might want to limit the scope of share -- eg. put it in a where
>>> clause for the function where you're using it -- so that it doesn't
>>> eat memory for the lifetime of your program, only for when you need
>>> it.
>>>
>>> Luke
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>



More information about the Haskell-Cafe mailing list