[GHC] #10652: Better cache performance in Array#
GHC
ghc-devs at haskell.org
Fri Jul 17 17:01:47 UTC 2015
#10652: Better cache performance in Array#
-------------------------------------+-------------------------------------
Reporter: | Owner:
MikeIzbicki | Status: new
Type: feature | Milestone:
request | Version: 7.10.1
Priority: normal | Operating System: Unknown/Multiple
Component: Compiler | Type of failure: None/Unknown
Keywords: | Blocked By:
Architecture: | Related Tickets:
Unknown/Multiple |
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
A common use case for arrays is to loop over them, performing an operation
on each element of the array. For `ByteArray#`s, this is guaranteed to
have good cache performance: When element `i` is accessed, the CPU's
prefetcher will ensure that the element `i+1` is already sitting in cache,
so the next loop will not suffer a cache miss. Currently, the `Array#`
and `SmallArray#` types do not significantly benefit from the prefetcher.
This feature request asks for a few new primops that would improve the
situation.
My understanding is that the "raw" element in an `Array#` is actually a
pointer to the region in memory associated with the "logical" element of
the array. When looping, the subsequent pointers are guaranteed to get
prefetched, but the subsequent logical element is not guaranteed to be
prefetched. In particular, if we'll only get the benefits of prefetching
on the logical elements if we're lucky enough that the memory manager
happened to allocate these logical elements on the heap next to each
other. I don't know enough about GHC internals to check the source code,
but my experiments demonstrate that this is not currently happening in my
use case, resulting in rather significant performance degradations. (The
experiments are rather complicated, so I'm not attaching them.)
I propose adding the primops:
{{{
packArray# :: Array# a -> b -> b
packSmallArray# :: SmallArray# a -> b -> b
packMutableArray# :: MutableArray# s a -> State# s -> State# s
packSmallMutableArray# :: SmallMutableArray# s a -> State# s -> State# s
}}}
These operations would have the semantic effect of noops, with the
exception that they request that the logical elements in the arrays be
arranged adjacently in memory. Thus, future loops over the arrays would
benefit from CPU prefetching. There are a number of ways the memory
manager could handle these requests, and I don't particular care which is
chosen. For example, the memory manager could rearrange the memory
immediately or wait until the next garbage collection pass and do it then.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10652>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list