Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
Ryan Newton
rrnewton at gmail.com
Tue Jul 2 22:36:24 CEST 2013
> In particular, will the compiler be able to avoid allocating when
>> building up that large monadic computation
>> in the foldrWithKey?
>>
>
> Since it is a foldr, the first action can be run without knowing the
> following ones. That is, at no time all actions must be allocated.
>
Hi all,
Well, to test it out I went ahead and made a patch to containers, here:
https://github.com/rrnewton/containers/commit/9b1a913c923fd9409932434894cca21cd4e313de
Henning, while I agree with you that GHC shouldn't truly *need* to generate
a first class representation of that 10M-step monadic action... it remains
the case that the updated test (attached to the end of this email), shows
the traverseWithKey_ version allocating only 200K, whereas the foldrWithKey
allocates 32M.
Further, I argue that the traverseWithKey_ version is clearer to
programmers that don't yet grok the first class nature of monadic actions
(e.g. a fold to build up one big monadic action). And again, if we are not
sure of the performance implications of that in this thread, I imagine many
Haskell programmers would not be.
So unless there's a strong counterargument, I propose the above patch for
acceptance.
-Ryan
P.S. Using the HEAD version of cabal the allocation for -O0 foldrWithKey
actually went up from 200M to 300M.
Appendix: updated code below and at this URL:
https://gist.github.com/rrnewton/5912513#file-maptest-hs
--------------------------------------------
*import Control.DeepSeq*
*import GHC.Stats*
*import qualified Data.Map.Strict as M*
*import Data.Time.Clock*
*import Control.Exception*
*import System.Mem*
*
*
*main :: IO ()*
*main = do*
* t0 <- getCurrentTime*
* let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])*
* evaluate$ rnf m0 *
* t1 <- getCurrentTime*
* performGC*
* s1 <- getGCStats *
* putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show
s1++"\n"*
* let fn 500000 v = putStrLn "Got it!"*
* fn _ _ = return ()*
*
*
* -- Regular traverseWithKey uses 48MB*
* -- traverseWithKey_ usse 200K of allocation:*
* M.traverseWithKey_ fn m0*
* t2 <- getCurrentTime*
* performGC*
* s2 <- getGCStats *
* putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime t2
t1)++"\n "++ show s2++"\n"*
* putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s2 -
bytesAllocated s1)*
*
*
* -- foldrWithKey uses 32MB allocation:*
* M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0*
* t3 <- getCurrentTime*
* performGC*
* s3 <- getGCStats *
* putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime t3
t2)++"\n "++ show s3++"\n"*
* putStrLn$"Bytes allocated during consume: "++show (bytesAllocated s3 -
bytesAllocated s2)*
* return ()*
*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130702/a5ef42f5/attachment.htm>
More information about the Libraries
mailing list