Mixed boxed/unboxed arrays?

Andreas Klebinger klebinger.andreas at gmx.at
Tue Aug 2 19:25:33 UTC 2022


Indeed I misunderstood. As you already suspected this wouldn't work for
Int# (or other unboxed types) sadly as the GC would assume these to be
pointers which no doubt would lead to segfaults or worse.

Rereading your initial mail I can say the runtime currently doesn't
support such a heap object.
If I understand you correctly what you would like is basically a
something like:

Con n P I# P I# P I#  ...
        \/ \/\/
      Pair1 Pair2 Pair3 ...

Where n gives the number of pairs.

I can see how it might be feasible to add a heap object like this to GHC
but I'm unsure if it would be worth the complexity as it's layout
diverges quite a bit from what GHC usually expects.

The other option would be to expose to users a way to have an object
that consist of a given number of words and a bitmap which indicates to
the GHC which fields are pointers. This is more or less
the representation that's already used to deal with stack frames iirc so
that might not be as far fetched as it seems at first.
It might even be possible to implement some sort of prototype for this
using hand written Cmm.

But there are not any plans to implement anything like this as far as I
know.

Am 02/08/2022 um 20:51 schrieb Jaro Reinders:
>
> It seems you have misunderstood me. I want to store *unboxed* Int#s
> inside the array, not just some unlifted types. Surely in the case of
> unboxed integers the unsafeCoerce# function can make the garbage
> collector crash as they might be interpreted as arbitrary pointers.
>
> Cheers,
>
> Jaro
>
> On 02/08/2022 20:24, Andreas Klebinger wrote:
>>
>> I think it's possible to do this *today* using unsafeCoerce#.
>>
>> I was able to come up with this basic example below. In practice one
>> would at the very least want to abstract away the gnarly stuff inside a
>> library. But since it sounds like you want to be the one to write a
>> library that shouldn't be a problem.
>>
>> {-# LANGUAGE MagicHash #-}
>> {-# LANGUAGE UnboxedTuples #-}
>> {-# LANGUAGE UnliftedDatatypes #-}
>> moduleMainwhere
>> importGHC.Exts
>> importGHC.IO
>> importUnsafe.Coerce
>> importData.Kind
>> dataSA= SA (SmallMutableArray# RealWorldAny)
>> mkArray:: Int-> a-> IO(SA)
>> mkArray (I# n) initial = IO $ \s ->
>> caseunsafeCoerce# (newSmallArray# n initial s) of
>>         (# s', arr #) -> (# s', SA arr #)
>> readLifted:: SA-> Int-> IOa
>> readLifted (SA arr) (I# i) = IO (\s ->
>>     unsafeCoerce# (readSmallArray# arr i s)
>>     )
>> dataUWrap(a:: UnliftedType) = UWrap a
>> -- UWrap is just here because we can't return unlifted types in IO
>> -- If you don't need your result in IO you can eliminate this
>> indirection.
>> readUnlifted:: foralla. SA-> Int-> IO(UWrapa)
>> readUnlifted (SA arr) (I# i) = IO (\s ->
>> caseunsafeCoerce# (readSmallArray# arr i s) of
>>         (# s', a :: a#) -> (# s', UWrap a #)
>>     )
>> writeLifted:: a-> Int-> SA-> IO()
>> writeLifted x (I# i) (SA arr) = IO $ \s ->
>> casewriteSmallArray# (unsafeCoerce# arr) i x s of
>>         s -> (# s, ()#)
>> writeUnlifted:: (a:: UnliftedType) -> Int-> SA-> IO()
>> writeUnlifted x (I# i) (SA arr) = IO $ \s ->
>> casewriteSmallArray# arr i (unsafeCoerce# x) s of
>>         s -> (# s, ()#)
>> typeUB:: UnliftedType
>> dataUB= UT | UF
>> showU:: UWrapUB-> String
>> showU (UWrap UT) = "UT"
>> showU (UWrap UF) = "UF"
>> main:: IO()
>> main = do
>>     arr <- mkArray 4()
>>     writeLifted True 0arr
>>     writeLifted False 1arr
>>     writeUnlifted UT 2arr
>>     writeUnlifted UT 3arr
>>     (readLifted arr 0:: IOBool) >>= print
>>     (readLifted arr 1:: IOBool) >>= print
>>     (readUnlifted arr 2:: IO(UWrapUB)) >>= (putStrLn . showU)
>>     (readUnlifted arr 3:: IO(UWrapUB)) >>= (putStrLn . showU)
>>     return ()
>>
>> Cheers
>>
>> Andreas
>>
>> Am 02/08/2022 um 17:32 schrieb J. Reinders:
>>>> Could you use `StablePtr` for the keys?
>>> That might be an option, but I have no idea how performant stable pointers are and manual management is obviously not ideal.
>>>
>>>> How does the cost of computing object hashes and comparing colliding
>>>> objects compare with the potential cache miss cost of using boxed
>>>> integers or a separate array?  Would such an "optimisation" be worth
>>>> the effort?
>>> Literature on hash tables suggests that cache misses were a very important factor in running time (in 2001):https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.4189
>>>
>>> I don’t know whether it has become less or more important now, but I have been told there haven’t been that many advances in memory latency.
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20220802/b21ee43d/attachment.html>


More information about the ghc-devs mailing list