[Haskell-beginners] Fwd: CAF taking all the memory

Mon May 9 14:11:03 UTC 2016


> Begin forwarded message:
> 
> From: Mahdi <mdibaiee at aol.com>
> Subject: Re: [Haskell-beginners] CAF taking all the memory
> Date: May 8, 2016 at 18:06:29 GMT+4:30
> To: Ben Gamari <ben at smart-cactus.org>
> 
> Wow, This is a great explanation! Thank you!
> I see your point on using lazy lists (although I was not aware of the specific problems you mentioned).
> So, I switched over to HMatrix [0], (the libraries you mentioned added a lot of complexity to the code)
> which uses unboxed arrays for matrices.
> 
> After refactoring my code to use HMatrix, I ran the program, and the same thing happens,
> the speed hasn’t changed much either, then I tried to take another profile to see if anything has changed,
> but I couldn’t as described here [1].
> 
> But my main question is, why would Haskell even store all this data? There are no side-effects, so
> there should be no memory leak in my code.
> From what I read online, Haskell is trying to avoid re-calculating the functions by caching their
> input/outputs (they’re pure, so this can be done), but I hoped there would be a way to disable this,
> unfortunately there doesn’t seem to be a way (or there is?).
> If that’s the case, then using libraries won’t solve the problem, and that’s frustrating.
> 
> * I also tried compiling with -O0, but no luck.
> 
> [0] http://dis.um.es/~alberto/hmatrix/hmatrixtut.html
> [1] https://github.com/albertoruiz/hmatrix/issues/190
> 
> Thank you Ben!
> 
> - Mahdi
> 
>> On May 3, 2016, at 14:36, Ben Gamari <ben at smart-cactus.org> wrote:
>> 
>> Mahdi <mdibaiee at aol.com> writes:
>> 
>>> Hello there,
>>> 
>> Hi!
>> 
>>> I’m pretty new to Haskell and I’m trying to write a Neural Network in
>>> Haskell for educational purposes.
>>> 
>>> So, I have my neural network working, it can learn XOR in 500
>>> iterations, but the thing is, if I increase the iterations to
>>> something like 5 milion times, the process just drains my RAM until
>>> either it’s killed or the OS drowns. Here is the code: [0]
>>> 
>>> I searched online and tried to get information on why this is
>>> happening, I profiled the memory usage and found that the memory is
>>> taken by CAF, searching online,
>>> 
>> Great! The next question you should then ask is "which CAF"? A CAF is
>> simply a top-level "constant" in your program. Indeed it sounds like you
>> have not defined any cost-centers, which means that GHC will attribute
>> the entire cost of your program to the ambiguous cost-center "CAF"
>> (which in this case really just means "the whole program").
>> 
>> As discussed in the users guide [1] one way to define cost-centers
>> within your program is to manually annotate expressions with SCC
>> pragmas. However, in this case we simply want to let GHC do this for us,
>> for which we can use the `-fprof-auto -fprof-cafs` flags (which
>> automatically annotate top-level definitions and CAFs with
>> cost-centers),
>> 
>>   $ ghc -O examples/xor.hs -fprof-auto -fprof-cafs
>> 
>> Now your program should give a much more useful profile (run having set
>> the iteration count to 50000),
>> 
>>   $ time examples/xor +RTS -p 
>>   [[0.99548584],[4.5138146e-3],[0.9954874],[4.513808e-3]]
>> 
>>   real	0m4.019s
>>   user	0m0.004s
>>   sys	0m4.013s
>>   $ cat xor.prof
>>     Tue May  3 11:15 2016 Time and Allocation Profiling Report  (Final)
>> 
>>       xor +RTS -p -RTS
>> 
>>     total time  =        1.11 secs   (1107 ticks @ 1000 us, 1 processor)
>>     total alloc = 1,984,202,600 bytes  (excludes profiling overheads)
>> 
>>   COST CENTRE  MODULE      %time %alloc
>> 
>>   matrixMap    Utils.Math   21.4   26.8
>>   matadd       Utils.Math   20.8   22.7
>>   matmul.\     Utils.Math   10.4   16.0
>>   dot          Utils.Math    7.1   13.4
>>   column       Utils.Math    7.0    2.9
>>   dot.\        Utils.Math    6.9    1.9
>>   rowPairs     Utils.Math    5.8    6.5
>>   sigmoid'     NN            4.7    0.8
>>   train.helper Main          4.0    1.3
>>   sigmoid      NN            3.3    0.8
>>   matmul       Utils.Math    2.2    2.0
>>   hadamard     Utils.Math    1.8    2.1
>>   columns      Utils.Math    1.2    1.3
>> 
>>                                                                       individual      inherited
>>   COST CENTRE                MODULE                no.     entries  %time %alloc   %time %alloc
>> 
>>   MAIN                       MAIN                   79          0    0.0    0.0   100.0  100.0
>>   [snip]
>>    CAF:main3                 Main                  143          0    0.0    0.0   100.0  100.0
>>     (...)                    Main                  162          1    0.0    0.0   100.0  100.0
>>      train                   Main                  163          1    0.0    0.0   100.0  100.0
>>       train.helper           Main                  164      50001    4.0    1.3   100.0  100.0
>>        train.helper.hweights Main                  258      50001    0.5    0.0     0.5    0.0
>>        train.helper.oweights Main                  235      50001    0.4    0.0     0.4    0.0
>>        train.helper.oback    Main                  207      50000    0.3    0.1    19.0   20.9
>>         backward'            NN                    208      50000    0.3    0.6    18.7   20.8
>>   [snip]
>> 
>> So, here we see that costs are actually spread throughout the program.
>> 
>> Without diving any deeper into this particular program it's hard to give
>> more guidance however I will say that your lazy list Matrix
>> representation is very rarely the right choice for even toy linear
>> algebra problems.
>> 
>> First, consider the fact that even just a list cons constructor requires
>> three words (an info table pointer, a pointer to payload, and a pointer
>> to tail) plus the size of the payload (which in the case of an evaluated
>> `Float` is 2 words: one info table pointer and the floating point value
>> itself). So, a list of n *evaluated* `Float`s (which would require only
>> 4*n bytes if packed densely) will require 40*n bytes if represented as a
>> lazy list.
>> 
>> Then, consider the fact that indexing into a lazy list is an O(n)
>> operation: this means that your `Math.column` operation on an n x m
>> matrix may be O(n*m). Even worse, `Math.columns`, as used by
>> `Math.matmul` is O(n * m!).
>> 
>> Finally, consider the fact that whenever you "construct" a lazy list you
>> aren't actually performing any computation: you are actually
>> constructing a single thunk which represents the entire result; however,
>> if you then go to index into the middle of that list you will end up
>> constructing n cons cells and a thunk for the payload of each. In the
>> case of primitive linear algebra operations the cost of constructing
>> this payload thunk can be greater than simply computing the result.
>> 
>> For these reasons I wouldn't recommend that lazy lists are used in this
>> way. If you have a dense matrix use an array (probably even unboxed;
>> see, for instance, the `array`, `vector`, and `repa` libraries); if
>> you have a sparse matrix then use an appropriate sparse representation
>> (although sadly good sparse linear algebra tools are hard to come by in
>> Haskell) Not only will the result be significantly more efficient in
>> space and time but the runtime behavior of the program will be
>> significantly easier to follow since you can more easily ensure that
>> evaluation occurs when you expect it to.
>> 
>> Hopefully this helps. Good luck and let us know if there are further
>> issues.
>> 
>> Cheers,
>> 
>> - Ben
>> 
>> 
>> [1] http://downloads.haskell.org/~ghc/master/users-guide//profiling.html#cost-centres-and-cost-centre-stacks
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160509/b7ecd57c/attachment.html>