[Haskell-cafe] Fwd: Efficient records with arbitrarily many fields [was: Extensible states]

Sat Jul 4 19:45:48 UTC 2015

Not sure that it will make sense for a cafe level, but here is my
understanding of the current state. For the problem of "anonymous
records", i.e. some datatype that may have different fields and their
types we have following appoaches:

1. vinyl/hlist/etc. i.e. GADT base heterogeneous list. This one have
O(N) random access and may not be a good fit for "anonymous record"
use case

2. records: polymorphic data structures with type-level information
about field names. Here we have a lot of datastructures one per number
of fields multiply on 2 (lazy and strict variant). This one has O(1) random
access.

3. fixed-vector-hetero/"wheel above". array based structures has O(1) access.
I call code above a "wheel" because there were a number of similar
implementations here and there.

I'm not the author or contributor of any libraries above, so I may miss
something, but here is my view: (let's not take a hlist into the scope, because
it should be used in different scenarios).

record provides a good approach + nice syntactic sugar (based on TH or
preprocessor or possibly compiler). The downside here is that we have O(N)
internal structures and exponential number of instances. You could take a look
at records haddock page, use 10% zoom for the best feeling :).

array-based approach have only one structure and 1 instance for any number
of fields, and this is a benefit. But there is a cost as records could be used
even without syntactic sugar (e.g. pattern matching), and with fixed vector it
will be a bit tricker (if possible), for example my approach above as it stands
will not be usable without syntactic sugar similar to the records one. There
is one more problem, library heavily uses `unsafeCoerce` so it should be a
part of the trusted base, of cause this is totally safe as typechecker checks
that, nevertheless it may be a concern for the users.

my wheel could reuse fixed-vector-hetero, as this is basically tagged structure
over fixed vector, I have not reused that because of 2 reasons:
1. I have not found a nice way to update fields in immutable vector (maybe I
just missed that part of API)
2. I don't see a nice way to give user a choice to make fields strict, and in
a wheel it's possible to introduce another typelevel variable that will be used
as a strictness marker, and lens that we build will appreciate that. Btw, here
we may have a solution that is better that "all-or-nothing" one in the records
package.
this solution may well cope with solution for "overloaded records"
that is worked
on.
3. If fixed-vector will not be reused it's possible to convert
structure in O(1).

Anyway records have all required tooling around it, and seems like a best
solution in this (anonymous records with O(1) field access) area at least for
now.

--
Alexander

On 4 July 2015 at 17:19, Ben Franksen <ben.franksen at online.de> wrote:
> Thanks, I didn't know of vector-fixed-hetero. Quite interesting.
>
> Anyway, I am not so much interested in getting reasonable efficiency from
> any of the existing libraries right now, as I am in reaching consensus about
> what kind of compiler support would be needed to make any of them practical.
>
> Cheers
> Ben
>
> Alexander V Vershilov wrote:
>> (forgot to reply to cafe list
>> Hi.
>>
>> You can take a look at vector-fixed-hetero [1],
>> that can act as anonymous structure with arbitrary number of fields,
>> convertion to and from
>>
>> datatypes with the same structure and many more features. It's missing
>> field names though
>> and syntactic sugar that 'records' package have. If you like to
>> reinvent a wheel, then
>>
>> you can use following appoach:
>>
>> introduce some typelevel info
>>
>> data a :? (b :: Symbol)
>>
>> data FieldName (t :: Symbol) = FieldName
>>
>> introduce generic structure for records with any number of fiels
>>
>> newtype ARec (m :: [*]) = ARec { unRec :: Array Any }
>> type instance TProxy (ARec m) = m
>> type instance VProxy (ARec m) = ARec
>>
>> rerec :: ARec a -> ARec b
>> rerec = ARec . unRec
>>
>> indexRec :: (KnownNat (FieldIndex name m), FieldType name m ~ a)
>>          => proxy name -> ARec m -> a
>> indexRec name v = unsafeCoerce $
>>   indexArray (unRec v) (fromIntegral (natVal (indexProxy name v)))
>>
>> updateRec :: (KnownNat (Length m), KnownNat (FieldIndex name m),
>> FieldType name m ~ a)
>>           => proxy name -> a -> ARec m -> ARec m
>> updateRec name b v = runST $ do
>>     out <- newArray len undefined
>>     copyArray out 0 (unRec v) 0 len
>>     writeArray out idx (unsafeCoerce b)
>>     ARec <$> unsafeFreezeArray out
>>   where
>>     idx = fromIntegral (natVal (indexProxy name v))
>>     len = fromIntegral (natVal (lengthProxy v))
>>
>> you'll need some typelevel magic for that:
>> type family FieldType (n :: Symbol) m where
>>   FieldType n ((a :? n) ': z) = a
>>   FieldType n (b ': z)        = FieldType n z
>>
>> type family FieldIndex (n::Symbol) m :: Nat where
>>   FieldIndex n ((a :? n) ': z) = 0
>>   FieldIndex n (   b     ': z) = 1 + FieldIndex n z
>>
>> indexProxy :: (KnownNat c, FieldIndex n m ~ c) => proxy1 n -> proxy2 m
>> -> Proxy c
>> indexProxy _ _ = Proxy
>>
>> type family Length m where
>>   Length '[] = 0
>>   Length (x ': xs) = 1 + Length xs
>>
>> lengthProxy :: (KnownNat c, Length n ~ c) => proxy n -> Proxy c
>> lengthProxy _ = Proxy
>>
>> then you can implement lenses:
>>
>> instance (KnownNat (FieldIndex n m), KnownNat (Length m), FieldType n m ~
>> a)
>>          => HasField (n :: Symbol) (ARec m)  a where
>>   getField = indexRec
>>   updateField p = flip (updateRec p)
>>
>>
>> fieldLens' :: (HasField name z a, FieldType name (TProxy z) ~ a)
>>            => FieldName name -> Lens z z a a
>> fieldLens' name = \f m -> fmap (updateField name m)
>>                                (f $ getField name m)
>>
>>
>> type family UpdateType (n :: Symbol) z a b where
>>   UpdateType n ((a :? n) ': z ) a b = (b :? n) ': z
>>   UpdateType n (   z     ': zs) a b =    z     ': UpdateType n zs a b
>>
>> fieldLens :: ( ARec m ~ z, ARec m' ~ z', m' ~ UpdateType name m a b
>>              , FieldType name m  ~ a, FieldType name m' ~ b
>>              , KnownNat (FieldIndex name m), KnownNat (FieldIndex name m')
>>              , KnownNat (Length m), KnownNat (Length m')
>>              ) => FieldName name -> Lens z z' a b
>> fieldLens name = \f m -> fmap (updateField name (rerec m))
>>                               (f $ getField name m)
>>
>> this approach is more or less the same as records package with only one
>> datastructure and almost the same syntactic sugar can be applied, the
>>
>> only missing thing is that pattern matching will be more difficult that
>> with
>>
>> records.
>>
>> At this point it's not possible to write strict fields, but it can be
>> easily extended.
>>
>> If someone is interested in this sort of wheel, I can prepare a package
>> and some
>>
>> docs about and with coercion with other solutions like
>> fixed-vector-hetero and records.
>>
>>
>>
>> [1] https://hackage.haskell.org/package/fixed-vector-hetero
>>
>>
>> On Sat, Jul 4, 2015, 16:08 Ben Franksen <ben.franksen at online.de> wrote:
>>>
>>> Marcin Mrotek wrote:
>>> > Okay, perhaps I'm too newbie to understand the big picture, but it
>>> > seems to me you can get either:
>>> >
>>> > a) O(1) access to any, arbitrarily selected (at runtime) field
>>> > b) O(1) append
>>> >
>>> > I guess option a) is better performance-wise, as appending is usually
>>> > done less often than selecting (an O(1) slice is already possible with
>>> > independently typed regular Haskell records) but
>>> > dependently-typed-list-based implementation, or at the very least
>>> > Vinyl (I haven't ever used HList) has the advantage of being dead
>>> > simple in both implementation and usage. I mean, with Vinyl, you can
>>> > write manual recursion over Rec's like:
>>> >
>>> > foo :: Rec ... -> Rec ...
>>> > foo RNil = ...
>>> > foo (r :& rs) = ...
>>> >
>>> > whenever GHC's typechecker gives up and goes on a strike; and I dare
>>> > to say, with commonly used record sizes (have you ever used a record
>>> > with more than, let's say, 10 fields?) the speed tradeoff is not
>>> > noticeable.
>>>
>>> While more than 10 fields in a record is uncommon for typical library
>>> APIs and simple programs, real world projects can grow much larger
>>> records. One example is configuration data for complex programs (like
>>> Darcs or even GHC) with many options. It would be so nice if we could use
>>> record types for the configuration! Another application could in control
>>> system toolkits like EPICS [1], which currently has (actually: generates)
>>> C records with potentially hundreds of fields.
>>>
>>> If lookup is / remains linear we can never efficiently support these
>>> kinds of applications and that would be very sad.
>>>
>>> I think the most reasonable default is O(1) for lookup and O(n) for
>>> extension, like in Nikita Volkov's record package. It is quite
>>> unfortunate that this package limits the number of fields! If GHC would
>>> offer generic support for tuples of arbitrary size (with the same
>>> efficiency as today) this limitation could be avoided and all would be
>>> well.
>>>
>>> Cheers
>>> Ben
>>>
>>> [1] http://www.aps.anl.gov/epics/
>>> --
>>> "Make it so they have to reboot after every typo." ― Scott Adams
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>
>>
> --
> "Make it so they have to reboot after every typo." ― Scott Adams
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

-- 
Alexander