[GHC] #8400: Migrate the RTS to use libuv (or libev, or libevent)

Mon Aug 7 15:32:44 UTC 2017

#8400: Migrate the RTS to use libuv (or libev, or libevent)
-------------------------------------+-------------------------------------
        Reporter:  schyler           |                Owner:  (none)
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Runtime System    |              Version:
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  635, 7353         |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by winter):

 I got my libuv based tcp server running! It performs almost identical
 under my thinkpad w540, here's a quick benchmark numbers:

 {{{
 ~/Code/stdio/bench/libuv(libuv*) » wrk -c1000 -d10s http://127.0.0.1:8888
 winter at winter-thinkpad-w540
 Running 10s test @ http://127.0.0.1:8888
   2 threads and 1000 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency     3.58ms    4.35ms 233.38ms   99.16%
     Req/Sec   109.35k    16.85k  165.69k    65.66%
   2158626 requests in 10.02s, 10.26GB read
 Requests/sec: 215444.90
 Transfer/sec:      1.02GB
 }}}

 In contrast, here is the original I/O manager in base, aka mio:

 {{{
 ~/Code/stdio/bench/libuv(libuv*) » wrk -c1000 -d10s http://127.0.0.1:8888
 winter at winter-thinkpad-w540
 Running 10s test @ http://127.0.0.1:8888
   2 threads and 1000 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency     5.03ms   11.50ms 436.51ms   99.53%
     Req/Sec   106.39k     6.27k  124.92k    77.50%
   2117274 requests in 10.07s, 10.07GB read
 Requests/sec: 210264.57
 Transfer/sec:      1.00GB

 }}}

 The benchmark code is here:
 https://github.com/winterland1989/stdio/tree/libuv/bench/libuv

 What is interesting here is the `+RTS -s` figures, first is the mio one:

 {{{
 30,227,784,440 bytes allocated in the heap
    4,608,251,832 bytes copied during GC
        4,058,040 bytes maximum residency (1442 sample(s))
        4,537,568 bytes maximum slop
               17 MB total memory in use (0 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0     34042 colls, 34042 par    9.782s   2.468s     0.0001s
 0.0074s
   Gen  1      1442 colls,  1441 par    4.523s   1.140s     0.0008s
 0.0043s

   Parallel GC work balance: 75.40% (serial 0%, perfect 100%)

   TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)

   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

   INIT    time    0.001s  (  0.001s elapsed)
   MUT     time   24.940s  ( 10.609s elapsed)
   GC      time   14.305s  (  3.608s elapsed)
   EXIT    time    0.002s  (  0.001s elapsed)
   Total   time   39.248s  ( 14.218s elapsed)

   Alloc rate    1,212,033,615 bytes per MUT second

   Productivity  63.5% of total user, 74.6% of total elapsed

 gc_alloc_block_sync: 357989
 whitehole_spin: 1
 gen[0].sync: 4817566
 gen[1].sync: 326374
 }}}

 Here's my libuv code's figure after `wrk` 's load:

 {{{
  6,666,812,808 bytes allocated in the heap
    2,177,870,680 bytes copied during GC
        3,574,680 bytes maximum residency (1370 sample(s))
        5,571,840 bytes maximum slop
               16 MB total memory in use (0 MB lost due to fragmentation)

                                      Tot time (elapsed)  Avg pause  Max
 pause
   Gen  0      6034 colls,  6034 par    4.802s   1.220s     0.0002s
 0.0063s
   Gen  1      1370 colls,  1369 par    3.994s   1.010s     0.0007s
 0.0025s

   Parallel GC work balance: 76.00% (serial 0%, perfect 100%)

   TASKS: 13 (1 bound, 12 peak workers (12 total), using -N4)

   SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

   INIT    time    0.001s  (  0.001s elapsed)
   MUT     time   23.782s  ( 13.182s elapsed)
   GC      time    8.796s  (  2.230s elapsed)
   EXIT    time    0.001s  (  0.001s elapsed)
   Total   time   32.581s  ( 15.413s elapsed)

   Alloc rate    280,326,860 bytes per MUT second

   Productivity  73.0% of total user, 85.5% of total elapsed

 gc_alloc_block_sync: 253553
 whitehole_spin: 0
 gen[0].sync: 3071193
 gen[1].sync: 186169
 }}}

 It seems that my new libuv based I/O manager reduce LOTS of
 allocations(maybe they're just moved to C side). Overall, i think my
 evaluation on libuv is successful, i'd like to discuss the possibilities
 to integrate it with base.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8400#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler