Faster Array#/MutableArray# copies
Johan Tibell
johan.tibell at gmail.com
Mon Feb 28 18:29:56 CET 2011
On Mon, Feb 28, 2011 at 9:01 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> On 18/02/2011 19:42, Nathan Howell wrote:
>>
>> On Fri, Feb 18, 2011 at 12:54 AM, Roman Leshchinskiy <rl at cse.unsw.edu.au
>> <mailto:rl at cse.unsw.edu.au>> wrote:
>>
>> Max Bolingbroke wrote:
>> > On 18 February 2011 01:18, Johan Tibell <johan.tibell at gmail.com
>> <mailto:johan.tibell at gmail.com>> wrote:> It seems like a sufficient
>> solution for your needs would be for us to
>> > use the LTO support in LLVM to inline across module boundaries - in
>> > particular to inline primop implementations into their call
>> sites. LLVM
>> > would then probably deal with unrolling small loops with
>> statically known
>> > bounds.
>>
>> Could we simply use this?
>>
>> http://llvm.org/docs/LangRef.html#int_memcpy
>>
>>
>> Might be easier to implement a PrimOp inlining pass, and to run it
>> before LLVM's built-in MemCpyOptimization pass [0]. This wouldn't
>> generally be as good as LTO but would work without gold.
>>
>> [0] http://llvm.org/doxygen/MemCpyOptimizer_8cpp_source.html
>
> Ideally you'd want the heap check in the primop to be aggregated into the
> calling function's heap check, and the primop should allocate directly from
> the heap instead of calling out to the RTS allocate(). All this is a bit
> much to expect LLVM to do, but we could do it in the Glorious New Code
> Generator...
>
> For small arrays like this maybe we should have a new array type that leaves
> out all the card-marking stuff too (or just use tuples, as Roman suggested).
I might try to use tuples directly. It would be very ugly though as I
would need a sum type of 32 different tuple sizes.
Johan
More information about the Glasgow-haskell-users
mailing list