Faster Array#/MutableArray# copies

Mon Feb 28 18:29:56 CET 2011

On Mon, Feb 28, 2011 at 9:01 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> On 18/02/2011 19:42, Nathan Howell wrote:
>>
>> On Fri, Feb 18, 2011 at 12:54 AM, Roman Leshchinskiy <rl at cse.unsw.edu.au
>> <mailto:rl at cse.unsw.edu.au>> wrote:
>>
>>    Max Bolingbroke wrote:
>>     > On 18 February 2011 01:18, Johan Tibell <johan.tibell at gmail.com
>>    <mailto:johan.tibell at gmail.com>> wrote:> It seems like a sufficient
>>    solution for your needs would be for us to
>>     > use the LTO support in LLVM to inline across module boundaries - in
>>     > particular to inline primop implementations into their call
>>    sites. LLVM
>>     > would then probably deal with unrolling small loops with
>>    statically known
>>     > bounds.
>>
>>    Could we simply use this?
>>
>>    http://llvm.org/docs/LangRef.html#int_memcpy
>>
>>
>> Might be easier to implement a PrimOp inlining pass, and to run it
>> before LLVM's built-in MemCpyOptimization pass [0]. This wouldn't
>> generally be as good as LTO but would work without gold.
>>
>> [0] http://llvm.org/doxygen/MemCpyOptimizer_8cpp_source.html
>
> Ideally you'd want the heap check in the primop to be aggregated into the
> calling function's heap check, and the primop should allocate directly from
> the heap instead of calling out to the RTS allocate(). All this is a bit
> much to expect LLVM to do, but we could do it in the Glorious New Code
> Generator...
>
> For small arrays like this maybe we should have a new array type that leaves
> out all the card-marking stuff too (or just use tuples, as Roman suggested).

I might try to use tuples directly. It would be very ugly though as I
would need a sum type of 32 different tuple sizes.

Johan