[Haskell-cafe] Explicit garbage collection

Thu Jan 7 17:03:38 EST 2010

Hmm, interesting. Seems like I have to take a closer look at GHC's  
garbage collection. Thanks!

On 8 Jan 2010, at 00:53, Edward Kmett wrote:

> That is kind of what I thought you were doing.
>
> Retooling to see if we can get any better performance out of  
> collecting, it seems that System.Mem.PerformGC does a foreign call  
> out to performMajorGC to ask for a global collection. But I would  
> hazard that you might be able to get most of the benefit by just  
> asking for the nursery to be cleaned up.
>
> To that end, perhaps it would be worthwhile to consider switching to  
> something like:
>
> foreign import ccall {-safe-} "performGC" performMinorGC :: IO ()
>
> -- System.Mem imports performMajorGC as 'performGC' but the minor  
> collection appears to be called performGC, hence the rename.
>
> This should let you call for a minor collection, which should be  
> considerably less time consuming. Especially as if you call it  
> frequently you'll be dealing with a mostly cleared nursery anyways.
>
> -Edward Kmett
>
> On Thu, Jan 7, 2010 at 4:39 PM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
> I liked it too. Seems like I have to show some real code, and my  
> apologies for a long e-mail.
>
> Well, that's what I'm actually trying to do and what I've managed to  
> achieve so far.
>
> > module Ask where
> > import Control.Monad
> > import Data.IORef
> > import Data.Maybe
> > import System.IO.Unsafe -- Yes, I would need that
> > import System.Mem
> > import System.Mem.Weak
>
> Suppose we're writing a RESTful server, which keeps all state on the  
> client. We want it to send questions to the client and get back the  
> answers. We can also send some state data to the client and we are  
> sure that the client would return them back unchanged. So, that's  
> the type of our server:
>
> > type Server medium = Maybe (medium, String) -> IO (Maybe (medium,  
> String))
>
> The client starts interaction, sending "Nothing"; when server  
> finally returns "Nothing", the session stops. I would emulate the  
> client in GHCi with the following function:
>
> > client :: Show medium => Bool -> Server medium -> IO ()
> > client debug server = client' Nothing where
> >     client' query =
> >         do response <- server query
> >            case response of
> >              Nothing -> return ()
> >              Just (medium, question) ->
> >                  do when debug $ putStr "Debug: " >> print medium
> >                     putStrLn question
> >                     putStr "--> "
> >                     answer <- getLine
> >                     client' $ Just (medium, answer)
>
> Nothing very interesting. The only thing to note is that the boolean  
> argument allows us to see the state sent by server - for debugging  
> purposes.
>
> But I want something more high-level. The goal is to solve the  
> Graham's "arc challenge". So the high-level interface is implemented  
> by this type:
>
> > data Ask a = Finish a | Continue String (String -> Ask a)
>
> A possible application would be:
>
> > test :: Ask ()
> > test =
> >     do s1 <- ask "Enter the first number"
> >        let n1 = read s1
> >        s2 <- ask "Enter the second number"
> >        let n2 = read s2
> >        ask $ show $ n1 + n2
> >        return ()
>
> Well, to do that, I need a Monad instance for Ask:
>
> > instance Monad Ask where
> >     return = Finish
> >     Finish x >>= h = h x
> >     Continue question k >>= h = Continue question $ \answer -> k  
> answer >>= h
>
> and an "ask" function:
>
> > ask :: String -> Ask String
> > ask question = Continue question $ \answer -> return answer
>
> Now, the problem is with the route from high level to the low level.  
> We can do it relatively simply:
>
> > simpleServer :: Ask () -> Server [String]
> > simpleServer anything = return . simpleServer' anything . maybe []  
> (\(medium, answer) -> medium ++ [answer]) where
> >     simpleServer' (Finish ()) _ = Nothing
> >     simpleServer' (Continue question _) [] = Just ([], question)
> >     simpleServer' (Continue _ k) (s : ss) =
> >         do (medium, question) <- simpleServer' (k s) ss
> >            return (s : medium, question)
>
> And indeed, our test example works:
>
> *Ask> client False $ simpleServer test
> Enter the first number
> --> 3
> Enter the second number
> --> 4
> 7
> -->
>
> But, as you've probably guessed already, this "simpleServer" sends  
> way too much information over network:
>
> > nonsense :: Ask ()
> > nonsense =
> >     do a <- ask "?"
> >        b <- ask a
> >        c <- ask b
> >        d <- ask c
> >        e <- ask b -- Note this "b" instead of "d", it's deliberate.
> >        return ()
>
> *Ask> client True $ simpleServer nonsense
> Debug: []
> ?
> --> a
> Debug: ["a"]
> a
> --> b
> Debug: ["a","b"]
> b
> --> c
> Debug: ["a","b","c"]
> c
> --> d
> Debug: ["a","b","c","d"]
> b
> --> e
>
> All user answers from the very beginning get transmitted.
>
> I want to transmit only those answers that are actually used. I'm  
> gonna write another server, a (relatively) smart one.
>
> A simple boxing type:
>
> > data Box a = Box {unBox :: a}
>
> Now, the server. I use "Nothing" for those user answers that are no  
> longer needed.
>
> > smartServer :: Ask () -> Server [Maybe String]
> > smartServer anything = smartServerIO anything . maybe [] (\ 
> (medium, answer) -> medium ++ [Just answer]) where
> >     smartServerIO anything query =
>
> Sometimes a value is not used anymore just because it WAS used  
> already. So I'll use a mutable list to keep track of those values  
> that are actually dereferenced:
>
> >         do usedList <- newIORef []
>
> Then, I'll make weak pointers for all user answers:
>
> >            let boxedQuery = map Box query
> >            weakPtrs <- mapM (\box -> mkWeak box () Nothing)  
> boxedQuery
>
> and actually run my "Ask" value.
>
> >            case findWeakPtrs 0 anything boxedQuery usedList of
> >              Finish () -> return Nothing
> >              Continue question rest ->
>
> Now, the interesting part. I'd invoke a garbage collector to get rid  
> of those boxes that aren't necessary anymore.
>
> >                  do performGC
>
> Now, weak pointers tell me what boxes are not dereferenced but can  
> be dereferenced in the future, and my mutable list has the numbers  
> of boxes that were dereferenced.
>
> >                     danglingNumbers <- mapM deRefWeak weakPtrs
> >                     usedNumbers <- readIORef usedList
>
> I also need to keep this "future" alive, so that usable boxes don't  
> get garbage collected.
>
> >                     case rest undefined of _ -> return ()
>
> And now all I need to do is to filter out those boxes that are not  
> in any of the two lists:
>
> >                     return $ Just (zipWith3 (isUsed usedNumbers)  
> [0..] danglingNumbers query, question)
>
> Helper function has an interesting detail:
>
> >     findWeakPtrs _ (Finish ()) _ _ = Finish ()
> >     findWeakPtrs _ response [] _ = response
>
> I use unsafePerformIO to track those boxes that get actually  
> dereferenced:
>
> >     findWeakPtrs n (Continue _ k) (s : ss) usedList =
> >         findWeakPtrs (n+1) (k (unsafePerformIO $ modifyIORef  
> usedList (n :) >> return (fromJust $ unBox s))) ss usedList
>
> A filtering function is not very interesting:
>
> >     isUsed usedNumbers index Nothing _ | not (index `elem`  
> usedNumbers) = Nothing
> >     isUsed _ _ _ answer = answer
>
> And it works:
>
> *Ask> client True $ smartServer nonsense
> Debug: []
> ?
> --> a
> Debug: [Just "a"]
> a
> --> b
> Debug: [Nothing,Just "b"]
> b
> --> c
> Debug: [Nothing,Just "b",Just "c"]
> c
> --> d
> Debug: [Nothing,Just "b",Nothing,Nothing]
> b
> --> e
>
> So, the problem is, I don't think calling performGC every time is a  
> good idea. I'd like to tell explicitly what values are to be garbage  
> collected. It doesn't seem like it's possible with current version  
> of GHC, though.
>
>
> On 7 Jan 2010, at 23:07, Edward Kmett wrote:
>
> Thats a shame, I was rather fond of this monad. To get that last  
> 'True', you'll really have to rely on garbage collection to  
> approximate reachability for you, leaving the weak reference  
> solution the best you can hope for.
>
> -Edward Kmett
>
> On Thu, Jan 7, 2010 at 12:39 PM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
> Damn. Seems like I really need (True, False, True) as a result of  
> "test".
>
>
> On 7 Jan 2010, at 08:52, Miguel Mitrofanov wrote:
>
> Seems very nice. Thanks.
>
> On 7 Jan 2010, at 08:01, Edward Kmett wrote:
>
> Here is a slightly nicer version using the Codensity monad of STM.
>
> Thanks go to Andrea Vezzosi for figuring out an annoying hanging bug  
> I was having.
>
> -Edward Kmett
>
> {-# LANGUAGE Rank2Types, GeneralizedNewtypeDeriving, DeriveFunctor #-}
> module STMOracle
>  ( Oracle, Ref
>  , newRef, readRef, writeRef, modifyRef, needRef
>  ) where
>
> import Control.Applicative
> import Control.Monad
> import Control.Concurrent.STM
>
> instance Applicative STM where
>  pure = return
>  (<*>) = ap
>
> newtype Ref s a = Ref (TVar (Maybe a))
> newtype Oracle s a = Oracle { unOracle :: forall r. (a -> STM r) ->  
> STM r } deriving (Functor)
>
> instance Monad (Oracle s) where
>  return x = Oracle (\k -> k x)
>  Oracle m >>= f = Oracle (\k -> m (\a -> unOracle (f a) k))
>
> mkOracle m = Oracle (m >>=)
>
> runOracle :: (forall s. Oracle s a) -> IO a
> runOracle t = atomically (unOracle t return)
>
> newRef :: a -> Oracle s (Ref s a)
> newRef a = mkOracle $ Ref <$> newTVar (Just a)
>
> readRef :: Ref s a -> Oracle s a
> readRef (Ref r) = mkOracle $ do
>  m <- readTVar r
>  maybe retry return m
>
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref r) = mkOracle $ do
>  writeTVar r (Just a)
>  return a
>
> modifyRef :: (a -> a) -> Ref s a -> Oracle s a
> modifyRef f r = do
>  a <- readRef r
>  writeRef (f a) r
>
> needRef :: Ref s a -> Oracle s Bool
> needRef (Ref slot) = Oracle $ \k ->
>           (writeTVar slot Nothing >> k False)
>  `orElse` k True
>
> -- test  
> case 
> :                                                                                                                               refMaybe 
>  b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref = fst `fmap` readRef ref
> test = do
>   a <- newRef "x"
>   b <- newRef 1
>   c <- newRef ('z', Just 0)
>   -- no performLocalGC required
>   x <- needRef a
>   y <- needRef b
>   z <- needRef c
>   u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                         -- but it won't be known until runtime.
>   w <- refIgnore b
>   v <- refFst c
>   return (x, y, z)
>
>
>
> On Wed, Jan 6, 2010 at 10:28 PM, Edward Kmett <ekmett at gmail.com>  
> wrote:
> I don't believe you can get quite the semantics you want. However,  
> you can get reasonably close, by building a manual store and  
> backtracking.
>
> {-# LANGUAGE Rank2Types #-}
> -- lets define an Oracle that tracks whether or not you might need  
> the reference, by backtracking.
> module Oracle
>  ( Oracle, Ref
>  , newRef, readRef, writeRef, modifyRef, needRef
>  ) where
>
> import Control.Applicative
> import Control.Arrow (first)
> import Control.Monad
> import Data.IntMap (IntMap)
> import qualified Data.IntMap as M
> import Unsafe.Coerce (unsafeCoerce)
> import GHC.Prim (Any)
>
> -- we need to track our own worlds, otherwise we'd have to build  
> over ST, change optimistically, and track how to backtrack the state  
> of the Store. Much uglier.
> -- values are stored as 'Any's for safety, see GHC.Prim for a  
> discussion on the hazards of risking the storage of function types  
> using unsafeCoerce as anything else.
> data World s = World { store :: !(IntMap Any), hwm :: !Int }
>
> -- references into our store
> newtype Ref s a = Ref Int deriving (Eq)
>
> -- our monad that can 'see the future' ~ StateT (World s) []newtype  
> Oracle s a = Oracle { unOracle :: World s -> [(a, World s)] }
>
> -- we rely on the fact that the list is always non-empty for any  
> oracle you can run. we are only allowed to backtrack if we thought  
> we wouldn't need the reference, and wound up needing it, so head  
> will always succeed.
> runOracle :: (forall s. Oracle s a) -> a
> runOracle f = fst $ head $ unOracle f $ World M.empty 1
>
>
> instance Monad (Oracle s) where
>  return a = Oracle $ \w -> [(a,w)]
>  Oracle m >>= k = Oracle $ \s -> do
>      (a,s') <- m s
>      unOracle (k a) s'
>
> -- note: you cannot safely define fail here without risking a crash  
> in runOracle
> -- Similarly, we're not a MonadPlus instance because we always want  
> to succeed eventually.
>
> instance Functor (Oracle s) where
>  fmap f (Oracle g) = Oracle $ \w -> first f <$> g w
>
> instance Applicative (Oracle s) where
>  pure = return
>  (<*>) = ap
>
> -- new ref allocates a fresh slot and inserts the value into the  
> store. the type level brand 's' keeps us safe, and we don't export  
> the Ref constructor.
> newRef :: a -> Oracle s (Ref s a)
> newRef a = Oracle $ \(World w t) ->
>  [(Ref t, World (M.insert t (unsafeCoerce a) w) (t + 1))]
>
> -- readRef is the only thing that ever backtracks, if we try to read  
> a reference we claimed we wouldn't need, then we backtrack to when  
> we decided we didn't need the reference, and continue with its value.
> readRef :: Ref s a -> Oracle s a
> readRef (Ref slot) = Oracle $ \world ->
>  maybe [] (\a -> [(unsafeCoerce a, world)]) $ M.lookup slot (store  
> world)
>
> -- note, writeRef dfoesn't 'need' the ref's current value, so  
> needRef will report False if you writeRef before you read it after  
> this.
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref slot) = Oracle $ \world ->
>      [(a, world { store = M.insert slot (unsafeCoerce a) $ store  
> world })]
>
> {-
> -- alternate writeRef where writing 'needs' the ref.
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref slot) = Oracle $ \World store v -> do
>  (Just _, store') <- return $ updateLookupWithKey replace slot store
>  [(a, World store' v)]
>  where
>  replace _ _ = Just (unsafeCoerce a)
> -}
>
> -- modifying a reference of course needs its current value.
> modifyRef :: (a -> a) -> Ref s a -> Oracle s a
> modifyRef f r = do
>  a <- readRef r
>  writeRef (f a) r
>
> -- needRef tries to continue executing the world without the element  
> in the store in question. if that fails, then we'll backtrack to  
> here, and try again with the original world, and report that the  
> element was in fact needed.
> needRef :: Ref s a -> Oracle s Bool
> needRef (Ref slot) = Oracle $ \world ->
>  [ (False, world { store = M.delete slot $ store world })
>  , (True, world)
>  ]
>
> -- test case:
> refMaybe b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref = fst <$> readRef ref
> test = do
>   a <- newRef "x"
>   b <- newRef 1
>   c <- newRef ('z', Just 0)
>   -- no performLocalGC required
>   x <- needRef a
>   y <- needRef b
>   z <- needRef c
>   u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                         -- but it won't be known until runtime.
>   w <- refIgnore b
>   v <- refFst c
>   return (x, y, z)
>
> -- This will disagree with your desired answer, returning:
>
> *Oracle> runOracle test
> Loading package syb ... linking ... done.
> Loading package array-0.2.0.0 ... linking ... done.
> Loading package containers-0.2.0.1 ... linking ... done.
> (False,False,True)
>
> rather than (True, False, True), because the oracle is able to see  
> into the future (via backtracking) to see that refMaybe doesn't use  
> the reference after all.
>
> This probably won't suit your needs, but it was a fun little exercise.
>
> -Edward Kmett
>
> On Wed, Jan 6, 2010 at 4:05 PM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
>
> On 6 Jan 2010, at 23:21, Edward Kmett wrote:
>
> You probably just want to hold onto weak references for your  
> 'isStillNeeded' checks.
>
> That's what I do now. But I want to minimize the network traffic, so  
> I want referenced values to be garbage collected as soon as possible  
> - and I couldn't find anything except System.Mem.performIO to do the  
> job - which is a bit too global for me.
>
> Otherwise the isStillNeeded check itself will keep you from garbage  
> collecting!
>
> Not necessary. What I'm imagining is that there is essentially only  
> one way to access the value stored in the reference - with readRef.  
> So, if there isn't any chance that readRef would be called, the  
> value can be garbage collected; "isStillNeeded" function only needs  
> the reference, not the value.
>
> Well, yeah, that's kinda like weak references.
>
>
> http://cvs.haskell.org/Hugs/pages/libraries/base/System-Mem-Weak.html
>
> -Edward Kmett
>
> On Wed, Jan 6, 2010 at 9:39 AM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
> I'll take a look at them.
>
> I want something like this:
>
> refMaybe b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref =
> do
>  (v, w) <- readRef ref
>  return v
> test =
> do
>  a <- newRef "x"
>  b <- newRef 1
>  c <- newRef ('z', Just 0)
>  performLocalGC -- if necessary
>  x <- isStillNeeded a
>  y <- isStillNeeded b
>  z <- isStillNeeded c
>  u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                        -- but it won't be known until runtime.
>  w <- refIgnore b
>  v <- refFst c
>  return (x, y, z)
>
> so that "run test" returns (True, False, True).
>
>
> Dan Doel wrote:
> On Wednesday 06 January 2010 8:52:10 am Miguel Mitrofanov wrote:
> Is there any kind of "ST" monad that allows to know if some STRef is  
> no
> longer needed?
>
> The problem is, I want to send some data to an external storage over a
> network and get it back later, but I don't want to send unnecessary  
> data.
>
> I've managed to do something like that with weak pointers,
> System.Mem.performGC and unsafePerformIO, but it seems to me that  
> invoking
> GC every time is an overkill.
>
> Oh, and I'm ready to trade the purity of runST for that, if necessary.
>
> You may be able to use something like Oleg's Lightweight Monadic  
> Regions to get this effect. I suppose it depends somewhat on what  
> qualifies a reference as "no longer needed".
>
> http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdf
>
> I'm not aware of anything out-of-the-box that does what you want,  
> though.
>
> -- Dan
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe