Non-updateable thunks
Joachim Breitner
breitner at kit.edu
Wed Aug 1 12:38:03 CEST 2012
Hello,
I’m still working on issues of performance vs. sharing; I must assume
some of the people here on the list must have seen my "dup"-paper¹ as
referees.
I’m now wondering about a approach where the compiler (either
automatically or by user annotation; I’ll leave that question for later)
would mark some thunks as reentrant, i.e. simply skip the blackholing
and update frame pushing. A quick test showed that this should work
quite well, take the usual example:
import System.Environment
main = do
a <- getArgs
let n = length a
print n
let l = [n..30000000]
print $ last l + last l
This obviously leaks memory:
$ ./Test +RTS -t
0
60000000
<<ghc: 2400054760 bytes, 4596 GCs, 169560494/935354240 avg/max
bytes residency (11 samples), 2121M in use, 0.00 INIT (0.00
elapsed), 0.63 MUT (0.63 elapsed), 4.28 GC (4.29 elapsed) :ghc>>
I then modified the the assembly (a crude but effective way of testing
this ;-)) to not push a stack frame:
$ diff -u Test.s Test-modified.s
--- Test.s 2012-08-01 11:30:00.000000000 +0200
+++ Test-modified.s 2012-08-01 11:29:40.000000000 +0200
@@ -56,20 +56,20 @@
leaq -40(%rbp),%rax
cmpq %r15,%rax
jb .LcpZ
- addq $16,%r12
- cmpq 144(%r13),%r12
- ja .Lcq1
- movq $stg_upd_frame_info,-16(%rbp)
- movq %rbx,-8(%rbp)
+ //addq $16,%r12
+ //cmpq 144(%r13),%r12
+ //ja .Lcq1
+ //movq $stg_upd_frame_info,-16(%rbp)
+ //movq %rbx,-8(%rbp)
movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
movq $30000000,0(%r12)
leaq -7(%r12),%rax
- movq %rax,-24(%rbp)
+ movq %rax,-8(%rbp)
movq 16(%rbx),%rax
- movq %rax,-32(%rbp)
- movq $stg_ap_pp_info,-40(%rbp)
+ movq %rax,-16(%rbp)
+ movq $stg_ap_pp_info,-24(%rbp)
movl $base_GHCziEnum_zdfEnumInt_closure,%r14d
- addq $-40,%rbp
+ addq $-24,%rbp
jmp base_GHCziEnum_enumFromTo_info
.Lcq1:
movq $16,192(%r13)
Now it runs fast and slim (and did not crash on the first try, which I
find surprising after hand-modifying the assembly code):
$ ./Test +RTS -t
0
60000000
<<ghc: 4800054840 bytes, 9192 GCs, 28632/28632 avg/max bytes
residency (1 samples), 1M in use, 0.00 INIT (0.00 elapsed), 0.73
MUT (0.73 elapsed), 0.04 GC (0.04 elapsed) :ghc>>
My question is: Has anybody worked in that direction? And are there any
fundamental problems with the current RTS implementation and such
closures?
Greetings,
Joachim
¹ http://arxiv.org/abs/1207.2017
currently not about to appear anywhere else, but I have not given up
hope yet :-)
--
Dipl.-Math. Dipl.-Inform. Joachim Breitner
Wissenschaftlicher Mitarbeiter
http://pp.info.uni-karlsruhe.de/~breitner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120801/b464ab2f/attachment.pgp>
More information about the Glasgow-haskell-users
mailing list