Non-updateable thunks

Simon Peyton-Jones simonpj at microsoft.com
Tue Aug 14 16:01:13 CEST 2012


The update analysis developed by Keith Wansbrough was very complicated, and although rather beautiful (read his thesis and our papers), it didn't pay its way in terms of complexity.

Coincidentally, Ilya Sergey is here on an internship and is now working on a "cheap and cheerful" update analysis that may get much of the benefit for much less cost. We'll see.

Simon

| -----Original Message-----
| From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-
| haskell-users-bounces at haskell.org] On Behalf Of Joachim Breitner
| Sent: 01 August 2012 11:38
| To: glasgow-haskell-users
| Subject: Non-updateable thunks
| 
| Hello,
| 
| I’m still working on issues of performance vs. sharing; I must assume
| some of the people here on the list must have seen my "dup"-paper¹ as
| referees.
| 
| I’m now wondering about a approach where the compiler (either
| automatically or by user annotation; I’ll leave that question for
| later) would mark some thunks as reentrant, i.e. simply skip the
| blackholing and update frame pushing. A quick test showed that this
| should work quite well, take the usual example:
| 
|         import System.Environment
|         main = do
|             a <- getArgs
|             let n = length a
|             print n
|             let l = [n..30000000]
|             print $ last l + last l
| 
| This obviously leaks memory:
| 
|         $ ./Test +RTS -t
|         0
|         60000000
|         <<ghc: 2400054760 bytes, 4596 GCs, 169560494/935354240 avg/max
|         bytes residency (11 samples), 2121M in use, 0.00 INIT (0.00
|         elapsed), 0.63 MUT (0.63 elapsed), 4.28 GC (4.29 elapsed)
| :ghc>>
| 
| 
| I then modified the the assembly (a crude but effective way of testing
| this ;-)) to not push a stack frame:
| 
| $ diff -u Test.s Test-modified.s
| --- Test.s	2012-08-01 11:30:00.000000000 +0200
| +++ Test-modified.s	2012-08-01 11:29:40.000000000 +0200
| @@ -56,20 +56,20 @@
|  	leaq -40(%rbp),%rax
|  	cmpq %r15,%rax
|  	jb .LcpZ
| -	addq $16,%r12
| -	cmpq 144(%r13),%r12
| -	ja .Lcq1
| -	movq $stg_upd_frame_info,-16(%rbp)
| -	movq %rbx,-8(%rbp)
| +	//addq $16,%r12
| +	//cmpq 144(%r13),%r12
| +	//ja .Lcq1
| +	//movq $stg_upd_frame_info,-16(%rbp)
| +	//movq %rbx,-8(%rbp)
|  	movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
|  	movq $30000000,0(%r12)
|  	leaq -7(%r12),%rax
| -	movq %rax,-24(%rbp)
| +	movq %rax,-8(%rbp)
|  	movq 16(%rbx),%rax
| -	movq %rax,-32(%rbp)
| -	movq $stg_ap_pp_info,-40(%rbp)
| +	movq %rax,-16(%rbp)
| +	movq $stg_ap_pp_info,-24(%rbp)
|  	movl $base_GHCziEnum_zdfEnumInt_closure,%r14d
| -	addq $-40,%rbp
| +	addq $-24,%rbp
|  	jmp base_GHCziEnum_enumFromTo_info
|  .Lcq1:
|  	movq $16,192(%r13)
| 
| Now it runs fast and slim (and did not crash on the first try, which I
| find surprising after hand-modifying the assembly code):
| 
|         $ ./Test +RTS -t
|         0
|         60000000
|         <<ghc: 4800054840 bytes, 9192 GCs, 28632/28632 avg/max bytes
|         residency (1 samples), 1M in use, 0.00 INIT (0.00 elapsed),
| 0.73
|         MUT (0.73 elapsed), 0.04 GC (0.04 elapsed) :ghc>>
| 
| 
| My question is: Has anybody worked in that direction? And are there any
| fundamental problems with the current RTS implementation and such
| closures?
| 
| Greetings,
| Joachim
| 
| 
| ¹ http://arxiv.org/abs/1207.2017
| currently not about to appear anywhere else, but I have not given up
| hope yet :-)
| 
| 
| --
| Dipl.-Math. Dipl.-Inform. Joachim Breitner Wissenschaftlicher
| Mitarbeiter http://pp.info.uni-karlsruhe.de/~breitner


More information about the Glasgow-haskell-users mailing list