[Haskell-cafe] regression in memory usage between 6.10 and 7.x?

Sergey Vinokurov serg.foo at gmail.com
Sat Dec 17 21:45:36 UTC 2016


Hi Ben,

After trying out different heap profiling modes it seems to me that
the problem is caused by the lazy empty and final fields (attached as
reg.svg). These fields contain thunks which hold references to regexps
and this quickly fills out the heap. However, making fields strict
improves the memory usage a lot, from 1030 MB to 61 MB:

$ ghc -O2 --make -o reg Reg.hs -prof -fprof-auto
$ ghc -O2 --make -o reg-strict RegStrict.hs -prof -fprof-auto
$ ./reg 1000 +RTS -s -hy
True
   2,244,020,400 bytes allocated in the heap
   4,518,833,264 bytes copied during GC
     487,761,824 bytes maximum residency (19 sample(s))
       2,235,504 bytes maximum slop
            1030 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3417 colls,     0 par    0.796s   0.835s     0.0002s    0.0109s
  Gen  1        19 colls,     0 par    4.740s   4.916s     0.2588s    0.7593s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.992s  (  2.349s elapsed)
  GC      time    4.156s  (  4.366s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    1.380s  (  1.385s elapsed)
  EXIT    time    0.004s  (  0.021s elapsed)
  Total   time    6.532s  (  6.736s elapsed)

  %GC     time      63.6%  (64.8% elapsed)

  Alloc rate    2,262,117,338 bytes per MUT second

  Productivity  15.2% of total user, 14.8% of total elapsed
$ ./reg-strict 1000 +RTS -s -hy
True
   2,227,779,568 bytes allocated in the heap
   1,584,870,472 bytes copied during GC
      20,760,456 bytes maximum residency (74 sample(s))
         389,104 bytes maximum slop
              61 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3702 colls,     0 par    0.512s   0.536s     0.0001s    0.0004s
  Gen  1        74 colls,     0 par    0.376s   0.378s     0.0051s    0.0171s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.604s  (  0.613s elapsed)
  GC      time    0.856s  (  0.890s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.032s  (  0.024s elapsed)
  EXIT    time    0.000s  (  0.003s elapsed)
  Total   time    1.492s  (  1.507s elapsed)

  %GC     time      57.4%  (59.1% elapsed)

  Alloc rate    3,688,376,768 bytes per MUT second

  Productivity  40.5% of total user, 40.1% of total elapsed

Regards,
Sergey

On Sat, Dec 17, 2016 at 10:51 PM, Ben Franksen <ben.franksen at online.de> wrote:
> In the functional pearl "A Play on Regular Expressions" by Sebastian
> Fischer, Frank Huch, and Thomas Wilke, the authors explain how to
> implement a Glushkov automaton in a very elegant way. They claim that
> when they compile their code with ghc-6.10.4 and with -O2 optimization
> they get:
>
> """
> 1 MB total memory in use
> ...
> Total time 0.06s (0.21s elapsed)
> """
>
> for a program that matches stdin against the equivalent (in grep -E
> syntax) of
>
>  ^(a?){500}a{500}$
>
> When I try their code on my machine with ghc-7.10.3 (and also -O2) I get:
>
> """
> ben at sarun[1]: .../play/regexfp > ghc Reg.hs -O2 -o re
> [1 of 1] Compiling Main             ( Reg.hs, Reg.o )
> Linking re ...
> ben at sarun[1]: .../play/regexfp > ./re 500 +RTS -s
> True
> ...
> 155 MB total memory in use
> ...
> Total   time    1.031s  (  1.037s elapsed)
> """
>
> The details of the RTS report furthermore tell me that GC is responsible
> for more than 80% of the runtime.
>
> Now, I certainly have a different machine and perhaps OS than the
> authors of the paper, but that can't explain a factor of 16 in runtime
> and of 150 in memory use. I have double and triple checked my code for
> possible mistakes. I /do/ get the expected results (same as in the paper).
>
> I tried different numbers and found
>
> n       total memory in use / MB
> 100     8
> 200     28
> 500     169
> 1000    698
>
> which shows that memory consumtion is roughly quadratic in n. I tried
> heap profiling but that tells me only that all the allocation is done by
> the function 'shift', which I already knew (or rather guessed).
>
> I also tried several compiler versions: 7.0.4, 7.10.3, 7.4.2, 7.6.3,
> 8.0.1. They all result in similar performance.
>
> The ghc version the authors used is 6.10.4 and I don't have a package
> for that compatible to my distribution.
>
> Any hint as to where these differences might come from are appreciated.
>
> I have attached the program for reference.
>
> Cheers
> Ben
> --
> "Make it so they have to reboot after every typo." ― Scott Adams
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reg.svg
Type: image/svg+xml
Size: 8389 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20161217/834d8744/attachment.svg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RegStrict.hs
Type: text/x-haskell
Size: 1840 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20161217/834d8744/attachment.hs>


More information about the Haskell-Cafe mailing list