Changes to Data.Typeable

Mon Jul 11 16:25:58 CEST 2011

2011/7/11 Simon Marlow <marlowsd at gmail.com>:
> On 08/07/2011 17:36, Gábor Lehel wrote:
>>
>> 2011/7/7 Simon Marlow<marlowsd at gmail.com>:
>>>
>>> On 07/07/11 17:14, Gábor Lehel wrote:
>>>>
>>>> On Thu, Jul 7, 2011 at 5:44 PM, Simon Marlow<marlowsd at gmail.com>
>>>>  wrote:
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> In response to this ticket:
>>>>>
>>>>>  http://hackage.haskell.org/trac/ghc/ticket/5275
>>>>>
>>>>> I'm making some changes to Data.Typeable, some of which affect the API,
>>>>> so
>>>>> as per the new library guidelines I'm informing the list.
>>>>>
>>>>> The current implementation of Typeable is based on
>>>>>
>>>>>  mkTyCon :: String ->    TyCon
>>>>>
>>>>> which internally keeps a table mapping Strings to Ints, so that each
>>>>> TyCon
>>>>> can be given a unique Int for fast comparison.  This means the String
>>>>> has
>>>>> to
>>>>> be unique across all types in the program.  Currently derived instances
>>>>> of
>>>>> typeable use the qualified original name (e.g. "GHC.Types.Int") which
>>>>> is
>>>>> not
>>>>> necessarily unique, is non-portable, and exposes implementation
>>>>> details.
>>>>>
>>>>> The String passed to mkTyCon is returned by
>>>>>
>>>>>  tyConString :: TyCon ->    String
>>>>>
>>>>> which lets the user get at this non-portable representation (also the
>>>>> Show
>>>>> instance returns this String).
>>>>>
>>>>> So the new proposal is to store three Strings in TyCon.  The internal
>>>>> representation is this:
>>>>>
>>>>> data TyCon = TyCon {
>>>>>   tyConHash    :: {-# UNPACK #-} !Fingerprint,
>>>>>   tyConPackage :: String,
>>>>>   tyConModule  :: String,
>>>>>   tyConName    :: String
>>>>>  }
>>>>>
>>>>> the fields of this type are not exposed externally.  Together the three
>>>>> fields tyConPackage, tyConModule and tyConName uniquely identify a
>>>>> TyCon,
>>>>> and the Fingerprint is a hash of the concatenation of these three
>>>>> Strings
>>>>> (so no more internal cache to map strings to unique Ids). tyConString
>>>>> now
>>>>> returns the value of tyConName only.
>>>>>
>>>>> I've measured the performance impact of this change, and as far as I
>>>>> can
>>>>> tell performance is uniformly better.  This should improve things for
>>>>> SYB
>>>>> in
>>>>> particular.  Also, the size of the code generated for deriving Typeable
>>>>> is
>>>>> less than half as much as before.
>>>>>
>>>>> === Proposed API changes ===
>>>>>
>>>>> 1. DEPRECATE mkTyCon
>>>>>
>>>>>   mkTyCon is used by some hand-written instances of Typeable.  It
>>>>>   will work as before, but is deprecated in favour of...
>>>>>
>>>>> 2. Add
>>>>>
>>>>>   mkTyCon3 :: String ->    String ->    String ->    TyCon
>>>>>
>>>>>   which takes the package, module, and name of the TyCon respectively.
>>>>>   Most users can just derive Typeable, there's no need to use mkTyCon3.
>>>>>
>>>>> In due course we can rename mkTyCon3 back to mkTyCon.
>>>>>
>>>>> Any comments?
>>>>>
>>>>> Cheers,
>>>>>        Simon
>>>>
>>>> Would this also mean typeRepKey could be taken out of the IO monad?
>>>> That would be nice.
>>>
>>> Ah yes, I forgot to mention the changes to typeRepKey.  So currently we
>>> have
>>>
>>>  typeRepKey :: TypeRep ->  IO Int
>>>
>>> this API is difficult to support in the new library, I'd have to
>>> reintroduce
>>> the cache, and it wouldn't be very efficient.  I plan to change it to
>>> this:
>>>
>>>  data TypeRepKey -- abstract, instance of Eq, Ord
>>>  typeRepKey :: TypeRep ->  IO TypeRepKey
>>>
>>> where TypeRepKey is a newtype of the internal Fingerprint.  Now, we could
>>> take typeRepKey out of IO, but the Ord instance of TypeRepKey is
>>> implementation-defined (it provides some total order, but we don't tell
>>> you
>>> what it is).  So arguably we should keep the IO.  What do people think?
>>
>> Would the order be allowed to vary from run to run of the program
>> (which is why it's in IO now)? Could it be specified as
>> implementation-defined but non-varying? If so, I would favor that
>> option along with taking it out of IO. (Plenty of things are
>> implementation-defined, like the size of an Int.)
>
> Yes, it's implementation-defined but non-varying.  I know some people have
> objected to these things being outside the IO monad before, but there is
> already plenty of precedent (System.Info.os, size of Int, isIEEE...).
>
> However, if we take it out of IO then it may limit the possible
> implementations.  Would the previous implementation, in which keys were
> assigned at runtime, still be valid?  It is still implementation-defined and
> non-varying, but only over a single run.

That's the question. It's in IO now because, while the keys don't vary
over a single run, they do vary between them. Presumably the new
version should be 'pure' if and only if that's no longer true. The
upsides (of not being in IO) are obvious, but unfortunately I don't
know much at all about the potential downsides in terms of limiting
implementations.

>
>> Albeit, the use case I had in mind was using Template Haskell to
>> construct a case statement over the literal Int values of the keys as
>> determined at compile time (hopefully compiling down to something like
>> a C switch statement), and I'm not sure if that's going to work if the
>> keys are no longer Ints. (That it wouldn't compile down to a switch
>> statement is one thing, but I'm not sure if the code would literally
>> be possible to write. Maybe it'd need a Lift instance?) Anyway, I
>> don't think it would hurt to take it out of IO if given the
>> opportunity, either way.
>
> The keys are 128-bit hashes, so it might still be possible to do something
> like this, but you would need access to the internal representations.  I'm
> planning to expose these via Data.Typeable.Internal (no guarantees about
> stability of this API, however).

I was going to suggest that a Lift instance could be provided in
Language.Haskell.TH.Syntax, but I see now that there's quite a few
types which could have an instance and don't, so that probably belongs
in a separate proposal. Just having the internals available will
hopefully be 'good enough' for the use case I mentioned (which itself
is not that important, just a nice optimization).

>
> Cheers,
>        Simon
>
>

-- 
Work is punishment for failing to procrastinate effectively.