[Haskell-cafe] Faster timeout but is it correct?

Fri Feb 18 02:01:02 CET 2011

On 18 February 2011 01:09, Bas van Dijk <v.dijk.bas at gmail.com> wrote:
> Benchmarks are coming...

Here are some preliminary benchmarks.

I used the latest GHC HEAD (7.1.20110217) build for performance.

Because I wanted to finish the build of ghc before I went to bed I
used a faster machine than my laptop. So the results should not be
compared to my previous results.

PC specs:
CPU: Intel Core 2 Duo 3Ghz. with 6MB cache
OS: An up to date 64bit Ubuntu 10.10

First of all the implementations:

The current:

data Timeout = Timeout Unique

timeout :: Int -> IO a -> IO (Maybe a)
timeout n f
    | n <  0    = fmap Just f
    | n == 0    = return Nothing
    | otherwise = do
        pid <- myThreadId
        ex  <- fmap Timeout newUnique
        handleJust (\e -> if e == ex then Just () else Nothing)
                   (\_ -> return Nothing)
                   (bracket (forkIO (threadDelay n >> throwTo pid ex))
                            (killThread)
                            (\_ -> fmap Just f))

The new:

newtype Timeout = Timeout ThreadId

timeout :: Int -> IO a -> IO (Maybe a)
timeout n f
    | n <  0    = fmap Just f
    | n == 0    = return Nothing
    | otherwise = do
      myTid <- myThreadId
      uninterruptibleMask $ \restore -> do
        tid <- unsafeUnmask $ forkIO $ do
                  tid <- myThreadId
                  threadDelay n
                  throwTo myTid $ Timeout tid
        (restore (fmap Just f) >>= \mb -> killThread tid >> return mb)
          `catch` \e ->
              case fromException e of
                Just (Timeout tid') | tid' == tid -> return Nothing
                _ -> killThread tid >> throwIO e

The event-manager based:

newtype Timeout = Timeout TimeoutKey

timeout :: Int -> IO a -> IO (Maybe a)
timeout usecs f
    | usecs <  0 = fmap Just f
    | usecs == 0 = return Nothing
    | otherwise  = do
        myTid <- myThreadId
        Just mgr <- readIORef eventManager
        mask $ \restore -> do
          key <- registerTimeoutWithKey mgr usecs $ \key ->
                   throwTo myTid $ Timeout key
          let unregTimeout = M.unregisterTimeout mgr key
          (restore (fmap Just f) >>= \mb -> unregTimeout >> return mb)
            `catch` \e ->
                case fromException e of
                  Just (Timeout key') | key' == key -> return Nothing
                  _ -> unregTimeout >> throwIO e

The benchmarks: (These should really be extended!)

willTimeout = shouldTimeout $ timeout 1 (threadDelay oneSec)
wontTimeout = shouldNotTimeout $ timeout oneSec (return ())

nestedTimeouts = shouldTimeout $ timeout 100000 $
                   shouldNotTimeout $ timeout (2*oneSec) $
                     threadDelay oneSec

externalException = do
  (tid, wait) <- fork $ timeout oneSec (threadDelay oneSec)
  threadDelay 500
  throwTo tid MyException
  r <- wait
  case r of
    Left e | Just MyException <- fromException e -> return ()
    _ -> error "MyException should have been thrown!"

Results:

The benchmarks were build with -O2 and -threaded and run without RTS
options (So no -N2, I may do that later but AFAIK the RTS will
automatically find the number of cores)

willTimeout/old         24.34945 us    1.0 x
willTimeout/new         26.91964 us    0.9 x (large std dev: 5 us)
willTimeout/event       12.94273 us    1.9 x  :-)

wontTimeout/old         16.25766 us    1.0 x
wontTimeout/new         637.8685 ns   25.5 x  :-)
wontTimeout/event       1.565311 us   10.4 x  :-)

externalException/old   10.28582 ms    1.0 x
externalException/new   9.960918 ms    1.0 x
externalException/event 10.25484 ms    1.0 x

nestedTimeouts/old      108.1759 ms    1.0 x
nestedTimeouts/new      108.4585 ms    1.0 x
nestedTimeouts/event    109.9614 ms    1.0 x

Preliminary conclusions:

I think the most important benchmark is wontTimeout because AFAIK
that's the most common situation. As can be seen, the new
implementation is 25 times faster than the old one. The event-manager
based implementation is 10 times faster than the old one but not quite
as fast as the new one. Although the event-manager based timeout has
to do less work the new one probably exploits parallelism because it
forks a thread to do part of its work.

A nice result is that in my previous efforts I couldn't achieve
speedups in the willTimeout benchmark. Fortunately the event-manager
based implementation is twice as fast as the original.

Further work:

I will brainstorm on this some more and update my patches for base
during the weekend.

Regards,

Bas