[GHC] #14955: Musings on manual type class desugaring

Wed Mar 21 11:18:31 UTC 2018

#14955: Musings on manual type class desugaring
-------------------------------------+-------------------------------------
           Reporter:  mpickering     |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Compiler       |           Version:  8.2.2
           Keywords:  SpecConstr     |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 I recently wrote a short post explaining why manual type class desugaring
 was
 different to actually writing a type class because of how they are
 optimised.
 http://mpickering.github.io/posts/2018-03-20-recordsvstypeclasses.html

 I implement 4 different equivalent programs which are all optimised
 differently. I paste the whole file below as it is not very big.

 Implementation 1 is in terms of a type class.
 Implementation 2 is in terms of explicit dictionary passing.
 Implementation 3 wraps a dictionary in a type class
 Implementation 4 wraps a dictionary in a type class with an additional
 dummy argument.

 Naively, a user would expect all 4 implementations to be as fast as each
 other.

 {{{
 {-# LANGUAGE NoImplicitPrelude #-}
 {-# LANGUAGE AllowAmbiguousTypes #-}
 module Prop where

 import Prelude (Bool(..), (||), (&&))

 -- Implementation 1

 class Prop r where
   or :: r -> r -> r
   and :: r -> r -> r
   true :: r
   false :: r

 instance Prop Bool where
   or = (||)
   and = (&&)
   true = True
   false = False

 -- Implementation 2

 data PropDict r = PropDict {
   dor :: r -> r -> r
   , dand :: r -> r -> r
   , dtrue :: r
   , dfalse :: r
   }

 boolDict = PropDict {
   dor = (||)
   , dand = (&&)
   , dtrue = True
   , dfalse = False }

 -- Implementation 3

 class PropProxy r where
   propDict :: PropDict r

 instance PropProxy Bool where
   propDict = boolDict

 -- Implementation 4

 class PropProxy2 r where
   propDict2 :: PropDict r
   dummy :: ()

 instance PropProxy2 Bool where
   propDict2 = boolDict
   dummy = ()

 ors :: Prop r => [r] -> r
 ors [] = true
 ors (o:os) = o `or` ors os
 {-# INLINABLE ors #-}

 dors :: PropDict r -> [r] -> r
 dors pd [] = dtrue pd
 dors pd (o:os) = dor pd o (dors pd os)

 pors :: PropProxy r => [r] -> r
 pors [] = dtrue propDict
 pors (o:os) = dor propDict o (pors os)
 {-# INLINABLE pors #-}

 porsProxy :: PropProxy2 r => [r] -> r
 porsProxy [] = dtrue propDict2
 porsProxy (o:os) = dor propDict2 o (porsProxy os)
 {-# INLINABLE porsProxy #-}
 }}}

 Then using the 4 different implementations of `ors` in another module
 implementations 1 and 4 are fast whilst 2 and 3 are slow.

 https://github.com/mpickering/rtcwrao-benchmarks/blob/master/Prop2.hs

 {{{
 benchmarking tc/Implementation 1
 time                 3.510 ms   (3.509 ms .. 3.512 ms)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 2.976 ms   (2.886 ms .. 3.060 ms)
 std dev              241.1 μs   (195.4 μs .. 293.1 μs)
 variance introduced by outliers: 51% (severely inflated)

 benchmarking tc/Implementation 2
 time                 25.05 ms   (21.16 ms .. 30.43 ms)
                      0.912 R²   (0.849 R² .. 0.984 R²)
 mean                 19.18 ms   (16.20 ms .. 21.45 ms)
 std dev              5.627 ms   (4.710 ms .. 6.618 ms)
 variance introduced by outliers: 89% (severely inflated)

 benchmarking tc/Implementation 3
 time                 20.06 ms   (15.33 ms .. 23.57 ms)
                      0.856 R²   (0.755 R² .. 0.934 R²)
 mean                 18.43 ms   (16.92 ms .. 19.85 ms)
 std dev              3.490 ms   (3.003 ms .. 4.076 ms)
 variance introduced by outliers: 74% (severely inflated)

 benchmarking tc/Implementation 4
 time                 3.498 ms   (3.484 ms .. 3.513 ms)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 3.016 ms   (2.935 ms .. 3.083 ms)
 std dev              205.7 μs   (162.6 μs .. 261.8 μs)
 variance introduced by outliers: 42% (moderately inflated)
 }}}

 I compiled the module with `-O2`. If I turn off `-fno-worker-wrapper` and
 `-fno-spec-constr` then implementation 3 is also fast. Implementation 2 is
 always slow.

 This ticket is querying what could be done to improve the robustness of
 these different refactorings.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14955>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler