Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton rrnewton at gmail.com
Tue Jul 2 22:36:24 CEST 2013


>  In particular, will the compiler be able to avoid allocating when
>> building up that large monadic computation
>> in the foldrWithKey?
>>
>
> Since it is a foldr, the first action can be run without knowing the
> following ones. That is, at no time all actions must be allocated.
>

Hi all,

Well, to test it out I went ahead and made a patch to containers, here:


https://github.com/rrnewton/containers/commit/9b1a913c923fd9409932434894cca21cd4e313de

Henning, while I agree with you that GHC shouldn't truly *need* to generate
a first class representation of that 10M-step monadic action... it remains
the case that the updated test (attached to the end of this email), shows
the traverseWithKey_ version allocating only 200K, whereas the foldrWithKey
allocates 32M.

Further, I argue that the traverseWithKey_ version is clearer to
programmers that don't yet grok the first class nature of monadic actions
(e.g. a fold to build up one big monadic action).  And again, if we are not
sure of the performance implications of that in this thread, I imagine many
Haskell programmers would not be.

So unless there's a strong counterargument, I propose the above patch for
acceptance.

   -Ryan

P.S. Using the HEAD version of cabal the allocation for -O0 foldrWithKey
actually went up from 200M to 300M.

Appendix: updated code below and at this URL:
https://gist.github.com/rrnewton/5912513#file-maptest-hs
--------------------------------------------
*import Control.DeepSeq*
*import GHC.Stats*
*import qualified Data.Map.Strict as M*
*import Data.Time.Clock*
*import Control.Exception*
*import System.Mem*
*
*
*main :: IO ()*
*main = do*
*  t0 <- getCurrentTime*
*  let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])*
*  evaluate$ rnf m0 *
*  t1 <- getCurrentTime*
*  performGC*
*  s1 <- getGCStats  *
*  putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show
s1++"\n"*
*  let fn 500000 v = putStrLn "Got it!"*
*      fn _      _ = return ()*
*
*
*  -- Regular traverseWithKey uses 48MB*
*  -- traverseWithKey_ usse 200K of allocation:*
*  M.traverseWithKey_ fn m0*
*  t2 <- getCurrentTime*
*  performGC*
*  s2 <- getGCStats *
*  putStrLn$"[traverseWithKey_] Consumed map in "++show (diffUTCTime t2
t1)++"\n "++ show s2++"\n"*
*  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s2 -
bytesAllocated s1)*
*
*
*  -- foldrWithKey uses 32MB allocation:*
*  M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0*
*  t3 <- getCurrentTime*
*  performGC*
*  s3 <- getGCStats *
*  putStrLn$"[foldrWithKey] Consumed map in "++show (diffUTCTime t3
t2)++"\n "++ show s3++"\n"*
*  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s3 -
bytesAllocated s2)*
*  return ()*
*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130702/a5ef42f5/attachment.htm>


More information about the Libraries mailing list