<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">Begin forwarded message:</div><br class="Apple-interchange-newline"><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;" class=""><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif; color:rgba(0, 0, 0, 1.0);" class=""><b class="">From: </b></span><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif;" class="">Mahdi <<a href="mailto:mdibaiee@aol.com" class="">mdibaiee@aol.com</a>><br class=""></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;" class=""><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif; color:rgba(0, 0, 0, 1.0);" class=""><b class="">Subject: </b></span><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif;" class=""><b class="">Re: [Haskell-beginners] CAF taking all the memory</b><br class=""></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;" class=""><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif; color:rgba(0, 0, 0, 1.0);" class=""><b class="">Date: </b></span><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif;" class="">May 8, 2016 at 18:06:29 GMT+4:30<br class=""></span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;" class=""><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif; color:rgba(0, 0, 0, 1.0);" class=""><b class="">To: </b></span><span style="font-family: -webkit-system-font, Helvetica Neue, Helvetica, sans-serif;" class="">Ben Gamari <<a href="mailto:ben@smart-cactus.org" class="">ben@smart-cactus.org</a>><br class=""></span></div><br class=""><div class=""><div class="">Wow, This is a great explanation! Thank you!<br class="">I see your point on using lazy lists (although I was not aware of the specific problems you mentioned).<br class="">So, I switched over to HMatrix [0], (the libraries you mentioned added a lot of complexity to the code)<br class="">which uses unboxed arrays for matrices.<br class=""><br class="">After refactoring my code to use HMatrix, I ran the program, and the same thing happens,<br class="">the speed hasn’t changed much either, then I tried to take another profile to see if anything has changed,<br class="">but I couldn’t as described here [1].<br class=""><br class="">But my main question is, why would Haskell even store all this data? There are no side-effects, so<br class="">there should be no memory leak in my code.<br class="">From what I read online, Haskell is trying to avoid re-calculating the functions by caching their<br class="">input/outputs (they’re pure, so this can be done), but I hoped there would be a way to disable this,<br class="">unfortunately there doesn’t seem to be a way (or there is?).<br class="">If that’s the case, then using libraries won’t solve the problem, and that’s frustrating.<br class=""><br class="">* I also tried compiling with -O0, but no luck.<br class=""><br class="">[0] <a href="http://dis.um.es/~alberto/hmatrix/hmatrixtut.html" class="">http://dis.um.es/~alberto/hmatrix/hmatrixtut.html</a><br class="">[1] <a href="https://github.com/albertoruiz/hmatrix/issues/190" class="">https://github.com/albertoruiz/hmatrix/issues/190</a><br class=""><br class="">Thank you Ben!<br class=""><br class="">- Mahdi<br class=""><br class=""><blockquote type="cite" class="">On May 3, 2016, at 14:36, Ben Gamari <<a href="mailto:ben@smart-cactus.org" class="">ben@smart-cactus.org</a>> wrote:<br class=""><br class="">Mahdi <<a href="mailto:mdibaiee@aol.com" class="">mdibaiee@aol.com</a>> writes:<br class=""><br class=""><blockquote type="cite" class="">Hello there,<br class=""><br class=""></blockquote>Hi!<br class=""><br class=""><blockquote type="cite" class="">I’m pretty new to Haskell and I’m trying to write a Neural Network in<br class="">Haskell for educational purposes.<br class=""><br class="">So, I have my neural network working, it can learn XOR in 500<br class="">iterations, but the thing is, if I increase the iterations to<br class="">something like 5 milion times, the process just drains my RAM until<br class="">either it’s killed or the OS drowns. Here is the code: [0]<br class=""><br class="">I searched online and tried to get information on why this is<br class="">happening, I profiled the memory usage and found that the memory is<br class="">taken by CAF, searching online,<br class=""><br class=""></blockquote>Great! The next question you should then ask is "which CAF"? A CAF is<br class="">simply a top-level "constant" in your program. Indeed it sounds like you<br class="">have not defined any cost-centers, which means that GHC will attribute<br class="">the entire cost of your program to the ambiguous cost-center "CAF"<br class="">(which in this case really just means "the whole program").<br class=""><br class="">As discussed in the users guide [1] one way to define cost-centers<br class="">within your program is to manually annotate expressions with SCC<br class="">pragmas. However, in this case we simply want to let GHC do this for us,<br class="">for which we can use the `-fprof-auto -fprof-cafs` flags (which<br class="">automatically annotate top-level definitions and CAFs with<br class="">cost-centers),<br class=""><br class="">   $ ghc -O examples/xor.hs -fprof-auto -fprof-cafs<br class=""><br class="">Now your program should give a much more useful profile (run having set<br class="">the iteration count to 50000),<br class=""><br class="">   $ time examples/xor +RTS -p <br class="">   [[0.99548584],[4.5138146e-3],[0.9954874],[4.513808e-3]]<br class=""><br class="">   real<span class="Apple-tab-span" style="white-space:pre">    </span>0m4.019s<br class="">   user<span class="Apple-tab-span" style="white-space:pre">      </span>0m0.004s<br class="">   sys<span class="Apple-tab-span" style="white-space:pre">       </span>0m4.013s<br class="">   $ cat xor.prof<br class="">     Tue May  3 11:15 2016 Time and Allocation Profiling Report  (Final)<br class=""><br class="">       xor +RTS -p -RTS<br class=""><br class="">     total time  =        1.11 secs   (1107 ticks @ 1000 us, 1 processor)<br class="">     total alloc = 1,984,202,600 bytes  (excludes profiling overheads)<br class=""><br class="">   COST CENTRE  MODULE      %time %alloc<br class=""><br class="">   matrixMap    Utils.Math   21.4   26.8<br class="">   matadd       Utils.Math   20.8   22.7<br class="">   matmul.\     Utils.Math   10.4   16.0<br class="">   dot          Utils.Math    7.1   13.4<br class="">   column       Utils.Math    7.0    2.9<br class="">   dot.\        Utils.Math    6.9    1.9<br class="">   rowPairs     Utils.Math    5.8    6.5<br class="">   sigmoid'     NN            4.7    0.8<br class="">   train.helper Main          4.0    1.3<br class="">   sigmoid      NN            3.3    0.8<br class="">   matmul       Utils.Math    2.2    2.0<br class="">   hadamard     Utils.Math    1.8    2.1<br class="">   columns      Utils.Math    1.2    1.3<br class=""><br class="">                                                                       individual      inherited<br class="">   COST CENTRE                MODULE                no.     entries  %time %alloc   %time %alloc<br class=""><br class="">   MAIN                       MAIN                   79          0    0.0    0.0   100.0  100.0<br class="">   [snip]<br class="">    CAF:main3                 Main                  143          0    0.0    0.0   100.0  100.0<br class="">     (...)                    Main                  162          1    0.0    0.0   100.0  100.0<br class="">      train                   Main                  163          1    0.0    0.0   100.0  100.0<br class="">       train.helper           Main                  164      50001    4.0    1.3   100.0  100.0<br class="">        train.helper.hweights Main                  258      50001    0.5    0.0     0.5    0.0<br class="">        train.helper.oweights Main                  235      50001    0.4    0.0     0.4    0.0<br class="">        train.helper.oback    Main                  207      50000    0.3    0.1    19.0   20.9<br class="">         backward'            NN                    208      50000    0.3    0.6    18.7   20.8<br class="">   [snip]<br class=""><br class="">So, here we see that costs are actually spread throughout the program.<br class=""><br class="">Without diving any deeper into this particular program it's hard to give<br class="">more guidance however I will say that your lazy list Matrix<br class="">representation is very rarely the right choice for even toy linear<br class="">algebra problems.<br class=""><br class="">First, consider the fact that even just a list cons constructor requires<br class="">three words (an info table pointer, a pointer to payload, and a pointer<br class="">to tail) plus the size of the payload (which in the case of an evaluated<br class="">`Float` is 2 words: one info table pointer and the floating point value<br class="">itself). So, a list of n *evaluated* `Float`s (which would require only<br class="">4*n bytes if packed densely) will require 40*n bytes if represented as a<br class="">lazy list.<br class=""><br class="">Then, consider the fact that indexing into a lazy list is an O(n)<br class="">operation: this means that your `Math.column` operation on an n x m<br class="">matrix may be O(n*m). Even worse, `Math.columns`, as used by<br class="">`Math.matmul` is O(n * m!).<br class=""><br class="">Finally, consider the fact that whenever you "construct" a lazy list you<br class="">aren't actually performing any computation: you are actually<br class="">constructing a single thunk which represents the entire result; however,<br class="">if you then go to index into the middle of that list you will end up<br class="">constructing n cons cells and a thunk for the payload of each. In the<br class="">case of primitive linear algebra operations the cost of constructing<br class="">this payload thunk can be greater than simply computing the result.<br class=""><br class="">For these reasons I wouldn't recommend that lazy lists are used in this<br class="">way. If you have a dense matrix use an array (probably even unboxed;<br class="">see, for instance, the `array`, `vector`, and `repa` libraries); if<br class="">you have a sparse matrix then use an appropriate sparse representation<br class="">(although sadly good sparse linear algebra tools are hard to come by in<br class="">Haskell) Not only will the result be significantly more efficient in<br class="">space and time but the runtime behavior of the program will be<br class="">significantly easier to follow since you can more easily ensure that<br class="">evaluation occurs when you expect it to.<br class=""><br class="">Hopefully this helps. Good luck and let us know if there are further<br class="">issues.<br class=""><br class="">Cheers,<br class=""><br class="">- Ben<br class=""><br class=""><br class="">[1] <a href="http://downloads.haskell.org/~ghc/master/users-guide//profiling.html#cost-centres-and-cost-centre-stacks" class="">http://downloads.haskell.org/~ghc/master/users-guide//profiling.html#cost-centres-and-cost-centre-stacks</a><br class=""></blockquote><br class=""></div></div></blockquote></div><br class=""></body></html>