Broken Data.Data instances

Alan & Kim Zimmerman alan.zimm at gmail.com
Tue Aug 12 20:51:47 UTC 2014


Status update

I have worked through a proof of concept update to the GHC AST whereby the
type is provided as a parameter to each data type. This was basically a
mechanical process of changing type signatures, and required very little
actual code changes, being only to initialise the placeholder types.

The enabling types are


    type PostTcType = Type        -- Used for slots in the abstract syntax
                    -- where we want to keep slot for a type
                    -- to be added by the type checker...but
                    -- [before typechecking it's just bogus]
    type PreTcType = ()             -- used before typechecking


    class PlaceHolderType a where
      placeHolderType :: a

    instance PlaceHolderType PostTcType where
      placeHolderType  = panic "Evaluated the place holder for a PostTcType"

    instance PlaceHolderType PreTcType where
      placeHolderType = ()

These are used to replace all instances of PostTcType in the hsSyn types.

The change was applied against HEAD as of last friday, and can be found here

https://github.com/alanz/ghc/tree/wip/landmine-param
https://github.com/alanz/haddock/tree/wip/landmine-param

They pass 'sh validate' with GHC 7.6.3, and compile against GHC 7.8.3. I
have not tried to validate that yet, have no reason to expect failure.


Can I please get some feedback as to whether this is a worthwhile change?

It is the first step to getting a generic traversal safe AST

Regards
  Alan


On Mon, Jul 28, 2014 at 5:45 PM, Alan & Kim Zimmerman <alan.zimm at gmail.com>
wrote:

> FYI I edited the paste at http://lpaste.net/108262 to show the problem
>
>
> On Mon, Jul 28, 2014 at 5:41 PM, Alan & Kim Zimmerman <alan.zimm at gmail.com
> > wrote:
>
>> I already tried that, the syntax does not seem to allow it.
>>
>> I suspect some higher form of sorcery will be required, as alluded to
>> here
>> http://stackoverflow.com/questions/14133121/can-i-constrain-a-type-family
>>
>> Alan
>>
>>
>> On Mon, Jul 28, 2014 at 4:55 PM, <p.k.f.holzenspies at utwente.nl> wrote:
>>
>>>  Dear Alan,
>>>
>>>
>>>
>>> I would think you would want to constrain the result, i.e.
>>>
>>>
>>>
>>> type family (Data (PostTcType a)) => PostTcType a where …
>>>
>>>
>>>
>>> The Data-instance of ‘a’ doesn’t give you much if you have a ‘PostTcType
>>> a’.
>>>
>>>
>>>
>>> Your point about SYB-recognition of WrongPhase is, of course, a good one
>>> ;)
>>>
>>>
>>>
>>> Regards,
>>>
>>> Philip
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Alan & Kim Zimmerman [mailto:alan.zimm at gmail.com]
>>> *Sent:* maandag 28 juli 2014 14:10
>>> *To:* Holzenspies, P.K.F. (EWI)
>>> *Cc:* Simon Peyton Jones; Edward Kmett; ghc-devs at haskell.org
>>>
>>> *Subject:* Re: Broken Data.Data instances
>>>
>>>
>>>
>>> Philip
>>>
>>> I think the main reason for the WrongPhase thing is to have something
>>> that explicitly has a Data and Typeable instance, to allow generic (SYB)
>>> traversal. If we can get by without this so much the better.
>>>
>>> On a related note, is there any way to constrain the 'a' in
>>>
>>> type family PostTcType a where
>>>   PostTcType Id    = TcType
>>>   PostTcType other = WrongPhaseTyp
>>>
>>>   to have an instance of Data?
>>>
>>> I am experimenting with traversals over my earlier paste, and got stuck
>>> here (which is the reason the Show instances were commentet out in the
>>> original).
>>>
>>> Alan
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jul 28, 2014 at 12:30 PM, <p.k.f.holzenspies at utwente.nl> wrote:
>>>
>>> Sorry about that… I’m having it out with my terminal server and the
>>> server seems to be winning. Here’s another go:
>>>
>>>
>>>
>>> I always read the () as “there’s nothing meaningful to stick in here,
>>> but I have to stick in something” so I don’t necessarily want the
>>> WrongPhase-thing. There is very old commentary stating it would be lovely
>>> if someone could expose the PostTcType as a parameter of the AST-types, but
>>> that there are so many types and constructors, that it’s a boring chore to
>>> do. Actually, I was hoping haRe would come up to speed to be able to do
>>> this. That being said, I think Simon’s idea to turn PostTcType into a
>>> type-family is a better way altogether; it also documents intent, i.e. ()
>>> may not say so much, but PostTcType RdrName says quite a lot.
>>>
>>>
>>>
>>> Simon commented that a lot of the internal structures aren’t trees, but
>>> cyclic graphs, e.g. the TyCon for Maybe references the DataCons for Just
>>> and Nothing, which again refer to the TyCon for Maybe. I was wondering
>>> whether it would be possible to make stateful lenses for this. Of course,
>>> for specific cases, we could do this, but I wonder if it is also possible
>>> to have lenses remember the things they visited and not visit them twice.
>>> Any ideas on this, Edward?
>>>
>>>
>>>
>>> Regards,
>>>
>>> Philip
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Alan & Kim Zimmerman [mailto:alan.zimm at gmail.com]
>>>
>>> *Sent:* maandag 28 juli 2014 11:14
>>>
>>> *To:* Simon Peyton Jones
>>> *Cc:* Edward Kmett; Holzenspies, P.K.F. (EWI); ghc-devs
>>>
>>>
>>> *Subject:* Re: Broken Data.Data instances
>>>
>>>
>>>
>>> I have made a conceptual example of this here http://lpaste.net/108262
>>>
>>> Alan
>>>
>>>
>>>
>>> On Mon, Jul 28, 2014 at 9:50 AM, Alan & Kim Zimmerman <
>>> alan.zimm at gmail.com> wrote:
>>>
>>> What about creating a specific type with a single constructor for the
>>> "not relevant to this phase" type to be used instead of () above? That
>>> would also clearly document what was going on.
>>>
>>> Alan
>>>
>>>
>>>
>>> On Mon, Jul 28, 2014 at 9:14 AM, Simon Peyton Jones <
>>> simonpj at microsoft.com> wrote:
>>>
>>> I've had to mangle a bunch of hand-written Data instances and push out
>>> patches to a dozen packages that used to be built this way before I
>>> convinced the authors to switch to safer versions of Data. Using virtual
>>> smart constructors like we do now in containers and Text where needed can
>>> be used to preserve internal invariants, etc.
>>>
>>>
>>>
>>> If the “hand grenades” are the PostTcTypes, etc, then I can explain why
>>> they are there.
>>>
>>>
>>>
>>> There simply is no sensible type you can put before the type checker
>>> runs.  For example one of the constructors  in HsExpr is
>>>
>>>   | HsMultiIf   PostTcType [LGRHS id (LHsExpr id)]
>>>
>>> After type checking we know what type the thing has, but before we have
>>> no clue.
>>>
>>>
>>>
>>> We could get around this by saying
>>>
>>>             type PostTcType = Maybe TcType
>>>
>>> but that would mean that every post-typechecking consumer would need a
>>> redundant pattern-match on a Just that would always succeed.
>>>
>>>
>>>
>>> It’s nothing deeper than that.  Adding Maybes everywhere would be
>>> possible, just clunky.
>>>
>>>
>>>
>>>
>>>
>>> However we now have type functions, and HsExpr is parameterised by an
>>> ‘id’ parameter, which changes from RdrName (after parsing) to Name (after
>>> renaming) to Id (after typechecking).  So we could do this:
>>>
>>>   | HsMultiIf   (PostTcType id) [LGRHS id (LHsExpr id)]
>>>
>>> and define PostTcType as a closed type family thus
>>>
>>>
>>>
>>>      type family PostTcType a where
>>>
>>>           PostTcType Id = TcType
>>>
>>>           PostTcType other = ()
>>>
>>>
>>>
>>> That would be better than filling it with bottoms.  But it might not
>>> help with generic programming, because there’d be a component whose type
>>> wasn’t fixed.  I have no idea how generics and type functions interact.
>>>
>>>
>>>
>>> Simon
>>>
>>>
>>>
>>> *From:* Edward Kmett [mailto:ekmett at gmail.com]
>>> *Sent:* 27 July 2014 18:27
>>> *To:* p.k.f.holzenspies at utwente.nl
>>> *Cc:* alan.zimm at gmail.com; Simon Peyton Jones; ghc-devs
>>>
>>>
>>> *Subject:* Re: Broken Data.Data instances
>>>
>>>
>>>
>>> Philip, Alan,
>>>
>>>
>>>
>>> If you need a hand, I'm happy to pitch in guidance.
>>>
>>>
>>>
>>> I've had to mangle a bunch of hand-written Data instances and push out
>>> patches to a dozen packages that used to be built this way before I
>>> convinced the authors to switch to safer versions of Data. Using virtual
>>> smart constructors like we do now in containers and Text where needed can
>>> be used to preserve internal invariants, etc.
>>>
>>>
>>>
>>> This works far better for users of the API than just randomly throwing
>>> them a live hand grenade. As I recall, these little grenades in generic
>>> programming over the GHC API have been a constant source of pain for
>>> libraries like haddock.
>>>
>>>
>>>
>>> Simon,
>>>
>>>
>>>
>>> It seems to me that regarding circular data structures, nothing prevents
>>> you from walking a circular data structure with Data.Data. You can generate
>>> a new one productively that looks just like the old with the contents
>>> swapped out, it is indistinguishable to an observer if the fixed point is
>>> lost, and a clever observer can use observable sharing to get it back,
>>> supposing that they are allowed to try.
>>>
>>>
>>>
>>> Alternately, we could use the 'virtual constructor' trick there to break
>>> the cycle and reintroduce it, but I'm less enthusiastic about that idea,
>>> even if it is simpler in many ways.
>>>
>>>
>>>
>>> -Edward
>>>
>>>
>>>
>>> On Sun, Jul 27, 2014 at 10:17 AM, <p.k.f.holzenspies at utwente.nl> wrote:
>>>
>>>  Alan,
>>>
>>> In that case, let's have a short feedback-loop between the two of us. It
>>> seems many of these files (Name.lhs, for example) are really stable through
>>> the repo-history. It would be nice to have one bigger refactoring all in
>>> one go (some of the code could use a polish, a lot of code seems removable).
>>>
>>> Regards,
>>> Philip
>>>   ------------------------------
>>>
>>> *Van:* Alan & Kim Zimmerman [alan.zimm at gmail.com]
>>> *Verzonden:* vrijdag 25 juli 2014 13:44
>>> *Aan:* Simon Peyton Jones
>>> *CC:* Holzenspies, P.K.F. (EWI); ghc-devs at haskell.org
>>> *Onderwerp:* Re: Broken Data.Data instances
>>>
>>> By the way, I would be happy to attempt this task, if the concept is
>>> viable.
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 11:23 PM, Alan & Kim Zimmerman <
>>> alan.zimm at gmail.com> wrote:
>>>
>>>    While we are talking about fixing traversals, how about getting rid
>>> of the phase specific panic initialisers for placeHolderType,
>>> placeHolderKind and friends?
>>>
>>> In order to safely traverse with SYB, the following needs to be inserted
>>> into all the SYB schemes (see
>>>
>>> https://github.com/alanz/HaRe/blob/master/src/Language/Haskell/Refact/Utils/GhcUtils.hs
>>> )
>>>
>>> -- Check the Typeable items
>>> checkItemStage1 :: (Typeable a) => SYB.Stage -> a -> Bool
>>> checkItemStage1 stage x = (const False `SYB.extQ` postTcType `SYB.extQ`
>>> fixity `SYB.extQ` nameSet) x
>>>   where nameSet     = const (stage `elem` [SYB.Parser,SYB.TypeChecker])
>>> :: GHC.NameSet       -> Bool
>>>         postTcType  = const (stage < SYB.TypeChecker                  )
>>> :: GHC.PostTcType    -> Bool
>>>         fixity      = const (stage < SYB.Renamer                      )
>>> :: GHC.Fixity        -> Bool
>>>
>>> And in addition HsCmdTop and ParStmtBlock are initialised with explicit
>>> 'undefined values.
>>>
>>> Perhaps use an initialiser that can have its panic turned off when
>>> called via the GHC API?
>>>
>>> Regards
>>>
>>>   Alan
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 11:06 PM, Simon Peyton Jones <
>>> simonpj at microsoft.com> wrote:
>>>
>>>    So... does anyone object to me changing these "broken" instances
>>> with the ones given by DeriveDataTypeable?
>>>
>>> That’s fine with me provided (a) the default behaviour is not immediate
>>> divergence (which it might well be), and (b) the pitfalls are documented.
>>>
>>>
>>>
>>> Simon
>>>
>>>
>>>
>>> *From:* "Philip K.F. Hölzenspies" [mailto:p.k.f.holzenspies at utwente.nl]
>>> *Sent:* 24 July 2014 18:42
>>> *To:* Simon Peyton Jones
>>> *Cc:* ghc-devs at haskell.org
>>> *Subject:* Re: Broken Data.Data instances
>>>
>>>
>>>
>>> Dear Simon, et al,
>>>
>>> These are very good points to make for people writing such traversals
>>> and queries. I would be more than happy to write a page on the pitfalls
>>> etc. on the wiki, but in my experience so far, exploring the innards of GHC
>>> is tremendously helped by trying small things out and showing (bits of) the
>>> intermediate structures. For me, personally, this has always been hindered
>>> by the absence of good instances of Data and/or Show (not having to bring
>>> DynFlags and not just visualising with the pretty printer are very helpful).
>>>
>>> So... does anyone object to me changing these "broken" instances with
>>> the ones given by DeriveDataTypeable?
>>>
>>> Also, many of these internal data structures could be provided with
>>> useful lenses to improve such traversals further. Anyone ever go at that?
>>> Would be people be interested?
>>>
>>> Regards,
>>> Philip
>>>
>>>     *Simon Peyton Jones* <simonpj at microsoft.com>
>>>
>>> 24 Jul 2014 18:22
>>>
>>> GHC’s data structures are often mutually recursive. e.g.
>>>
>>> ·        The TyCon for Maybe contains the DataCon for Just
>>>
>>> ·        The DataCon For just contains Just’s type
>>>
>>> ·        Just’s type contains the TyCon for Maybe
>>>
>>>
>>>
>>> So any attempt to recursively walk over all these structures, as you
>>> would a tree, will fail.
>>>
>>>
>>>
>>> Also there’s a lot of sharing.  For example, every occurrence of ‘map’
>>> is a Var, and inside that Var is map’s type, its strictness, its rewrite
>>> RULE, etc etc.  In walking over a term you may not want to walk over all
>>> that stuff at every occurrence of map.
>>>
>>>
>>>
>>> Maybe that’s it; I’m not certain since I did not write the Data
>>> instances for any of GHC’s types
>>>
>>>
>>>
>>> Simon
>>>
>>>
>>>
>>> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org
>>> <ghc-devs-bounces at haskell.org>] *On Behalf Of *
>>> p.k.f.holzenspies at utwente.nl
>>> *Sent:* 24 July 2014 16:42
>>> *To:* ghc-devs at haskell.org
>>> *Subject:* Broken Data.Data instances
>>>
>>>
>>>
>>> Dear GHC-ers,
>>>
>>>
>>>
>>> Is there a reason for explicitly broken Data.Data instances? Case in
>>> point:
>>>
>>>
>>>
>>> > instance Data Var where
>>>
>>> >   -- don't traverse?
>>>
>>> >   toConstr _   = abstractConstr "Var"
>>>
>>> >   gunfold _ _  = error "gunfold"
>>>
>>> >   dataTypeOf _ = mkNoRepType "Var"
>>>
>>>
>>>
>>> I understand (vaguely) arguments about abstract data types, but this
>>> also excludes convenient queries that can, e.g. extract all types from a
>>> CoreExpr. I had hoped to do stuff like this:
>>>
>>>
>>>
>>> > collect :: (Typeable b, Data a, MonadPlus m) => a -> m b
>>>
>>> > collect = everything mplus $ mkQ mzero return
>>>
>>> >
>>>
>>> > allTypes :: CoreExpr -> [Type]
>>>
>>> > allTypes = collect
>>>
>>>
>>>
>>> Especially when still exploring (parts of) the GHC API, being able to
>>> extract things in this fashion is very helpful. SYB’s “everything” being
>>> broken by these instances, not so much.
>>>
>>>
>>>
>>> Would a patch “fixing” these instances be acceptable?
>>>
>>>
>>>
>>> Regards,
>>>
>>> Philip
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140812/df9b24f2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1247 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140812/df9b24f2/attachment-0001.jpg>


More information about the ghc-devs mailing list