[Haskell-cafe] containers: Efficiently concatenating nested maps (or should I look at another data structure)

Tue Apr 18 13:16:43 UTC 2017

Hi all,

First, a little context. I'm currently working on [1] a little OpenGL
project - a Quake 3 map viewer in Haskell. For the most part, things are
working out great, but the way I construct the scene is costing me a lot
of alloction, and I wonder if I can do better.

The OpenGL system is essentially a giant state machine. Whenever you
issue a draw call ("draw these triangles"), OpenGL will do different
things depending on what the state is. For example, if you have a
texture bound, then the triangle will be textured, and if you have
blending enabled, then the triangle might be blended into whatever was
drawn previously.

While this can be tricky to manage, Haskell does a good job of giving us
tools to model state. The real spanner is that state transitions are not
free. In the above example, the state transitions associated with
binding a texture or enabling blending both cost time - and this time is
variable depending on the state change. In this case, setting the
currently bound texture is a very expensive operation, whereas setting
the blending mode is less costly. For this reason, it's common in
graphics programming to want to sort draw calls to minimise state
changes. For example, we might draw all triangles with texture 1, then
all triangles with texture 2 - rather than interleaving those draw
calls.

To model this in Haskell, I'm using a tree-like structure to group and
express each state change [2]. To give you an example of what this looks
like, I have:

  class RenderNode a m where
    draw :: a -> m ()

  instance RenderNode (IO ()) IO where
    draw = id

  newtype BindTexture a = BindTexture (MonoidalIntMap a) [3]
    deriving (Functor, Monoid)

  instance (MonadIO m, RenderNode a m) =>
          RenderNode (BindTexture a) m where
    draw (BindTexture t) =
      IM.foldrWithKey
        (\texture0 m next -> do
          glActiveTexture GL_TEXTURE0
          glBindTexture GL_TEXTURE_2D (fromIntegral texture0)
          draw m
          next)
        (return ())
        (getMonoidalIntMap t)

  newtype BlendMode a = BlendMode (MonoidalMap (GLuint, GLuint) a) [3]
    deriving (Functor, Monoid)

  blendMode :: (GLuint, GLuint) -> a -> BlendMode a
  blendMode srcDst = BlendMode . MonoidalMap . Map.singleton srcDst

  instance (MonadIO m, RenderNode a m) =>
          RenderNode (BlendMode a) m where
    draw (BlendMode m) =
      Map.foldrWithKey
        (\(srcBlend, destBlend) child next -> do
          glBlendFunc srcBlend destBlend
          draw child
          next)
        (return ())
        (getMonoidalMap m)

I can then stack these nodes together to form a "render graph":

  newtype RenderGraph = RenderGraph (BindTexture (BlendMode (IO ())))
    deriving (Monoid)

The IO action at the leaf corresponds to a draw call to draw some
triangles.

Now that I've explained the basics of my rendering plan, I just need to
tell you a little bit about Quake 3 maps. A Quake 3 map is essentially a
big soup of "faces", where each face is a collection of triangles (that
can be drawn in one call) and a set of associated state. Furthermore,
each face is assigned a "cluster" (an Int), and we have a function that
given a cluster gives us back a list of other clusters that we can "see"
- this is visibility determination, or occlusion culling.

I am hence modelling the map as an IntMap RenderGraph - each cluster
maps to a RenderGraph of how to draw all the faces in that cluster,
organised to minimise draw calls. Then, given some sort of visibility
check, when I render I filter this IntMap to only the clusters that can
be seen, and combine all the render graphs:

  IntMap.foldlWithKey'
    (\scene clusterNumber clusterFaces -> scene <> clusterFaces)
    mempty
    clusters

In reality, this results in combining a few hundred-thousad very deeply
nested structures, and naturally that's resulting in a fair bit of
allocation pressure. I need to do this almost every frame, because as
the camera moves, we might be able to see different clusters, so I don't
think there's really much that I can cache. My actual final RenderGraph
is

  Sort |> Cull |> MultiplePasses |> BindTexture |> SetUniform Bool |>
    SetDynamicUniform (M33 Float) |> AlphaFunc |> BlendMode |>
      DepthFunc

where (|>) is Functor composition, and each of those functors is either
a Map or IntMap.

It seems to deteriorate to pairwise `mappend`, as you can see from the
definition of mappend for MonoidalMap or MonoidalIntMap in [3].

Can anyone think of any obvious ways to rewrite this to have less
allocation pressure? I'm of course open to other ways to batch
operations, but there is something I find quite beautiful about shoving
everything into Maps and mappending them. In the C world, the approach
is usually to combine all the states into a single bitmask, and then
sort that. That same approach might be possible here, but I'd like to
keep the pick-and-choose compositional nature that I've got above.

Thanks for reading this far, I hope my post has made sense. If you need
clarification on anything, the source is below, and don't hesitate to
ask me any questions!

-- ocharles

[1]: http://github.com/ocharles/hs-quake-3
[2]: https://github.com/ocharles/hs-quake-3/blob/master/RenderGraph.hs
[3]:
  newtype MonoidalMap k a = MonoidalMap { getMonoidalMap :: Map k a }

  instance Functor (MonoidalMap k) where
    fmap f (MonoidalMap a) = MonoidalMap (fmap f a)

  instance (Monoid a, Ord k) =>
          Monoid (MonoidalMap k a) where
    mempty = MonoidalMap Map.empty
    mappend (MonoidalMap a) (MonoidalMap b) =
      MonoidalMap (Map.unionWith mappend a b)
    mconcat =
      coerce . Map.unionsWith mappend .
      (coerce :: [MonoidalMap k a] -> [Map k a])

  newtype MonoidalIntMap a = MonoidalIntMap { getMonoidalIntMap :: IntMap a }
    deriving (Functor)

  instance Monoid a => Monoid (MonoidalIntMap a) where
    mempty = MonoidalIntMap IM.empty
    mappend (MonoidalIntMap a) (MonoidalIntMap b) =
      MonoidalIntMap (IM.unionWith mappend a b)
    mconcat =
      coerce . IM.unionsWith mappend .
      (coerce :: [MonoidalIntMap a] -> [IntMap a])