Fixing Performance Leaks at the Type Level

Tue Jul 12 14:17:46 CEST 2011

This post is in literate Haskell. It describes how certain performance leaks are introduced in type level programming. These leaks do not affect program runtimes, but can cause compile times to grow drastically. They exist both with Functional Dependencies and Type Families, but are currently worse with the former, and have grown worse with the new constraint solver in GHC 7. It is intended both as a guide to those encountering these issues, and as a motivation for the GHC development team to address such issues as the constraint solver is developed and improved. 

> {-# OPTIONS_GHC -fcontext-stack=1000 #-} 
> {-# LANGUAGE FlexibleContexts, FlexibleInstances, FunctionalDependencies, MultiParamTypeClasses, OverlappingInstances, TypeSynonymInstances, TypeOperators, UndecidableInstances, TypeFamilies #-} 
> module TypePerformance where 

Our running example, for simplicity's sake, is a type-level map of a single function. For reference, here is the code for a simple value-level map of a single function. 

> vfoo = id 
> mapfoo (x : xs) = vfoo x : mapfoo xs 
> mapfoo [] = [] 

Because Haskell is a lazy language, this runs in O(n) time and constant stack. 

We now lift map to the type level, to operate over HLists. 

First, the basic HList types 

> infixr 3 :* 
> data x :* xs = x :* xs deriving Show 
> data HNil = HNil deriving Show 

Next, a large boring HList 

> -- Adds ten cells 
> addData x = i :* i :* d :* d :* s :* 
>             i :* i :* d :* d :* s :* 
>             x 
>     where i = 1 :: Int 
>           d = 1 :: Double 
>           s = "" 
> 
> -- Has 70 cells. 
> sampleData = addData $ addData $ addData $ addData $ addData $ 
>              addData $ addData $ 
>              HNil 

Next, a simple polymorphic function to map 

> class Foo x y | x -> y 
>     where foo :: x -> y 
>           foo = undefined 

> instance Foo Int Double 
> instance Foo Double Int 
> instance Foo String String 

Now, our map 

> class HMapFoo1 as bs | as -> bs where 
>     hMapFoo1 :: as -> bs 
> 
> instance (Foo a b, HMapFoo1 as bs) => HMapFoo1 (a :* as) (b :* bs) where 
>     hMapFoo1 (x :* xs) = foo x :* hMapFoo1 xs 
> 
> instance HMapFoo1 HNil HNil where 
>     hMapFoo1 _ = HNil 

If we enable the following line, compilation time is ~ 9 seconds. 

  > testHMapFoo1 = hMapFoo1 sampleData 

Furthermore, playing with the size of sampleData, we see that the time spent in compilation is superlinear -- each additional cell costs a greater amount of time. This is because while Haskell is lazy at the value level, it is strict at the type level. Therefore, just as in a strict language, HMapFoo1's cost grows O(n^2) because even as we induct over the as, we build up a stack of bs. Just as in a strict language, the solution is to make hMapFoo tail recursive through introducing an accumulator. This also reverses the hlist, but never mind that. 

> class HMapFoo2 acc as bs | acc as -> bs where 
>     hMapFoo2 :: acc -> as -> bs 
> 
> instance (Foo a b, HMapFoo2 (b :* bs) as res) => HMapFoo2 bs (a :* as) res where 
>     hMapFoo2 acc (x :* xs) = hMapFoo2 (foo x :* acc) xs 
> 
> instance HMapFoo2 acc HNil acc where 
>     hMapFoo2 acc _ = acc 

If we enable the following line, compilation time is a much more satisfying ~0.5s. 

  > testHMapFoo2 = hMapFoo2 HNil sampleData 

But wait, there's trouble on the horizon! Consider the following version: 

> class HMapFoo3 acc as bs | acc as -> bs where 
>     hMapFoo3 :: acc -> as -> bs 
> 
> instance (HMapFoo3 (b :* bs) as res, Foo a b) => HMapFoo3 bs (a :* as) res where 
>     hMapFoo3 acc (x :* xs) = hMapFoo3 (foo x :* acc) xs 
> 
> instance HMapFoo3 acc HNil acc where 
>     hMapFoo3 acc _ = acc 

The only difference between hMapFoo2 and hMapFoo3 is that the order of constraints on the inductive case has been reversed, with the recursive constraint first and the immediately checkable constraint second. Now, if we enable the following line, compilation time rockets to ~6s! 

  > testHMapFoo3 = hMapFoo3 HNil sampleData 

Slowdowns such as those described here are not a purely hypothetical issue, but have caused real problems in production code. The example given above is fairly simple. The constraints used are minimal and easily checked. In real programs, the constraints are more difficult, increasing constant factors significantly. These constant factors, combined with unexpected algorithmic slowdowns due to the type inference engine, can lead (and have lead) to compilation times of HList-style code becoming deeply unwieldy-to-unusable, and can lead (and have lead) to this occuring only well after this code has been introduced and used on smaller cases without trouble. 

The constraint solver should certainly be smart enough to reduce the compile times of HMapFoo3 to those of HMapFoo2. In fact, with type families, the there is no difference (see below). Could the compiler be smart enough to do the same for HMapFoo1? I'm not sure. Certainly, it could at least knock down its own constant factors a bit, even if it can't improve the big-O performance there. 

---- 
Appendix: Examples with Type Families 

As the below code demonstrates, the same issues demonstrated with Functional Dependencies also appear with Type Families, although less horribly, as their code-path seems more optimized in the current constraint solver: 

> class TFoo x where 
>      type TFooFun x 
>      tfoo :: x -> TFooFun x 
>      tfoo = undefined 
> 
> instance TFoo Int where 
>     type TFooFun Int = Double 
> instance TFoo Double where 
>     type TFooFun Double = Int 
> instance TFoo String where 
>     type TFooFun String = String 
> 
> class THMapFoo1 as where 
>       type THMapFoo1Res as 
>       thMapFoo1 :: as -> THMapFoo1Res as 
> 
> instance (TFoo a, THMapFoo1 as) => THMapFoo1 (a :* as) where 
>     type THMapFoo1Res (a :* as) = TFooFun a :* THMapFoo1Res as 
>     thMapFoo1 (x :* xs) = tfoo x :* thMapFoo1 xs 
> 
> instance THMapFoo1 HNil where 
>     type THMapFoo1Res HNil = HNil 
>     thMapFoo1 _ = HNil 

The following, when enabled, takes ~3.5s. This demonstrates that slowdown occurs with type families as well. 

  > testTHMapFoo1 = thMapFoo1 sampleData 

> class THMapFoo2 acc as where 
>       type THMapFoo2Res acc as 
>       thMapFoo2 :: acc -> as -> THMapFoo2Res acc as 
> 
> instance (TFoo a, THMapFoo2 (TFooFun a :* acc) as) => THMapFoo2 acc (a :* as) where 
>     type THMapFoo2Res acc (a :* as) = THMapFoo2Res (TFooFun a :* acc) as 
>     thMapFoo2 acc (x :* xs) = thMapFoo2 (tfoo x :* acc) xs 
> 
> instance THMapFoo2 acc HNil where 
>     type THMapFoo2Res acc HNil = acc 
>     thMapFoo2 acc _ = acc 

The following, when enabled, takes ~0.6s. This demonstrates that the tail recursive transform fixes the slowdown with type families just as with fundeps. 

  > testTHMapFoo2 = thMapFoo2 HNil sampleData 

> class THMapFoo3 acc as where 
>       type THMapFoo3Res acc as 
>       thMapFoo3 :: acc -> as -> THMapFoo3Res acc as 
> 
> instance (THMapFoo3 (TFooFun a :* acc) as, TFoo a) => THMapFoo3 acc (a :* as) where 
>     type THMapFoo3Res acc (a :* as) = THMapFoo3Res (TFooFun a :* acc) as 
>     thMapFoo3 acc (x :* xs) = thMapFoo3 (tfoo x :* acc) xs 
> 
> instance THMapFoo3 acc HNil where 
>     type THMapFoo3Res acc HNil = acc 
>     thMapFoo3 acc _ = acc 

The following, when enabled, also takes ~0.6s. This demonstrates that, unlike the fundep case, the order of type class constraints does not, in this instance, affect the performance of type families. 

  > testTHMapFoo3 = thMapFoo3 HNil sampleData 

--Gershom