From merijn at inconsistent.nl Thu Apr 17 14:19:21 2014 From: merijn at inconsistent.nl (Merijn Verstraaten) Date: Thu, 17 Apr 2014 16:19:21 +0200 Subject: Proposal - Foreign enum support Message-ID: Cross-post to haskell-prime in case there's any interest for including this into the report's FFI specification. Proposal - Foreign enum support =============================== At the moment the FFI does not have a convenient way with interacting enums (whether proper enums or CPP defines) in C (like languages). Both enums and CPP defined enums are major parts of large C APIs and they are thus crucial to writing foreign bindings. A few examples: SDL_image defines the following enum: typedef enum { IMG_INIT_JPG = 0x00000001, IMG_INIT_PNG = 0x00000002, IMG_INIT_TIF = 0x00000004, IMG_INIT_WEBP = 0x00000008 } IMG_InitFlags; OpenCL specifies the following typedefs + CPP defined enum: typedef uint32_t cl_uint __attribute__((aligned(4))); typedef cl_uint cl_platform_info; /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 OpenCL functions will return the above CPP defines as return values of type cl_platform_info. Current Solutions ----------------- In many cases someone wrapping such a C library would like to expose these enums as a simple sum type as this has several benefits: type safety, the ability to use haskell constructors for pattern matching, exhaustiveness checks. Currently the GHC FFI, as specified by Haskell2010, only marshalls a small set of foreign types and newtypes with exposed constructors of these types. As such there seem two approaches to wrap these enums: 1. Implement an ADT representing the enum and write a manual conversion function between the ADT and the corresponding C type (e.g. CInt -> Foo and Foo -> CInt). 2. Use a tool like c2hs to automatically generate the ADT and conversion function. In both cases the foreign functions are imported using the corresponding C type in their signature (reducing type safety) and the user is forced write trivial wrappers for every imported function to convert the ADT to the relevant C type and back. This is both tedious to write and costly in terms of code produced, in case of c2hs one calls "toEnum . fromIntegral" and "fromIntegral . fromEnum" for every argument/result even though this could trivially be a no-op. Worse, since c2hs uses the Enum class for it's conversion to/from C types it generates Enum instances like: instance Enum Foo where fromEnum Bar = 1 fromEnum Baz = 1337 toEnum 1 = Bar toEnum 1337 = Baz toEnum unmatched = error ("PlatformInfo.toEnum: Cannot match " ++ show unmatched) Since succ/pred and enumFromTo's default implementations assume enums convert to continuous sequence of Int this means the default generated enum instances crash. This problem could be overcome by making c2hs' code generation smarter, but this does not eliminate the tediousness of wrapping all foreign imported functions with marshalling wrappers, NOR does it eliminate the overhead of all this useless marshalling. Proposal -------- Add a new foreign construct for enums, the syntax I propose below is rather ugly and ambiguous and thereforeopen to bikeshedding, but I prefer explaining based on a concrete example. foreign enum CInt as Foo where Bar = 1 Baz Quux = 1337 Xyzzy = _ This would introduce a new type 'Foo' with semantics approximately equivalent too "newtype Foo = Foo CInt" plus the pattern synonyms "pattern Bar = Foo 1; pattern Baz = 2; pattern Quux = 1337; pattern Xyzzy = Foo _". Explicit listing of the value corresponding to a constructor should be optional, missing values should just increment by one from the previous (like C), if the initial value is missing, it should assume to start from 0. Values do not need to be contiguous. Users should be able to use these constructors as normal in pattern match (really, this mostly follows to semantics of the above pattern synonyms). The foreign import/export functionality should invisibly marshall Foo to the underlying foreign type (as is done for newtypes). I'm unsure about the support for a wildcard constructor like Xyzzy. If there is support for a wildcard, it should be optional. On the upside a wildcard means the marshalling is no longer a partial function. The downside is that it makes desugaring the use of enums in patterns harder. It seems clear that f Xyzzy = {- ... -} f Bar = {- ... -} f Baz = {- ... -} f Quux = {- ... -} Should not have the same semantics as: f (Foo _) = {- ... -} f (Foo 1) = {- ... -} f (Foo 2) = {- ... -} f (Foo 1337) = {- ... -} So in the presence of wildcards, the Foo enum can't trivially be desugared into pattern synonyms after checking exhaustiveness. Pros: 1. Foreign imports are slightly more type safe, as one can now write: foreign import ccall "someFoo.h" someFoo :: Foo -> Ptr () -> IO () Preventing users from passing an arbitrary CInt to an argument expecting a specific enum. 2. No need to write marshalling functions to/from ADT to obtain exhaustiveness checks and pattern matching 3. Cheaper as marshalling Foo to CInt is a no-op 4. toEnum/fromEnum can simply map to contiguous sequence of Int as this Int mapping is no longer used for marshalling Cons: 1. Non-standard extension of the FFI 2. Someone has to implement it 3. Wildcards constructors would present difficulties desugaring pattern matches to a simple newtype. 4. ?? What Would Need to be Done? --------------------------- 1. Parser needs to be extended to deal with parsing of enum declarations. 2. Pattern matches of an enum type need to be checked for exhaustiveness and desugared to the underlying type's representation. 3. Extend foreign imports/exports to marshall enums properly. If there's no objections I'm willing to take a stab at implementing this, although I'd probably need some help with GHC's internals (although I could bug #ghc for that). Cheers, Merijn -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From roma at ro-che.info Thu Apr 17 14:36:36 2014 From: roma at ro-che.info (Roman Cheplyaka) Date: Thu, 17 Apr 2014 17:36:36 +0300 Subject: Proposal - Foreign enum support In-Reply-To: References: Message-ID: <20140417143636.GA16407@sniper> I am reluctant about adding a new syntactic feature for such a niche problem. Can't this be achieved with a quaiquoter? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: From ekmett at gmail.com Sat Apr 19 14:13:00 2014 From: ekmett at gmail.com (Edward Kmett) Date: Sat, 19 Apr 2014 10:13:00 -0400 Subject: Proposal - Foreign enum support In-Reply-To: References: Message-ID: -1 from me. Your first example even provides a counter-example. typedef enum { > IMG_INIT_JPG = 0x00000001, > IMG_INIT_PNG = 0x00000002, > IMG_INIT_TIF = 0x00000004, > IMG_INIT_WEBP = 0x00000008 > } IMG_InitFlags; Those are defined as powers of two because they are a bit mask you have to be able to (.|.) together. This is the sort of thing people write .hsc files for, so they can include the appropriate header directly and resolve the constants. Maintaining a separate copy of an enum that goes out of date with the C version is a recipe for breaking on future versions of the dependency, and in my experience the majority of cases where the range is discontinuous arise from when the thing in question is a mask, like this very case. The remaining cases where you really want to incur all those obligations are few enough and far enough between that going through a quasiquoter seems to be the right solution. -Edward On Thu, Apr 17, 2014 at 10:19 AM, Merijn Verstraaten wrote: > Cross-post to haskell-prime in case there's any interest for including > this into the report's FFI specification. > > Proposal - Foreign enum support > =============================== > > At the moment the FFI does not have a convenient way with interacting enums > (whether proper enums or CPP defines) in C (like languages). Both enums > and CPP > defined enums are major parts of large C APIs and they are thus crucial to > writing foreign bindings. A few examples: > > SDL_image defines the following enum: > > typedef enum { > IMG_INIT_JPG = 0x00000001, > IMG_INIT_PNG = 0x00000002, > IMG_INIT_TIF = 0x00000004, > IMG_INIT_WEBP = 0x00000008 > } IMG_InitFlags; > > OpenCL specifies the following typedefs + CPP defined enum: > > typedef uint32_t cl_uint __attribute__((aligned(4))); > typedef cl_uint cl_platform_info; > > /* cl_platform_info */ > #define CL_PLATFORM_PROFILE 0x0900 > #define CL_PLATFORM_VERSION 0x0901 > #define CL_PLATFORM_NAME 0x0902 > #define CL_PLATFORM_VENDOR 0x0903 > #define CL_PLATFORM_EXTENSIONS 0x0904 > > OpenCL functions will return the above CPP defines as return values of type > cl_platform_info. > > Current Solutions > ----------------- > > In many cases someone wrapping such a C library would like to expose these > enums as a simple sum type as this has several benefits: type safety, the > ability to use haskell constructors for pattern matching, exhaustiveness > checks. > > Currently the GHC FFI, as specified by Haskell2010, only marshalls a small > set > of foreign types and newtypes with exposed constructors of these types. As > such > there seem two approaches to wrap these enums: > > 1. Implement an ADT representing the enum and write a manual conversion > function between the ADT and the corresponding C type (e.g. CInt -> > Foo and > Foo -> CInt). > > 2. Use a tool like c2hs to automatically generate the ADT and conversion > function. > > In both cases the foreign functions are imported using the corresponding C > type > in their signature (reducing type safety) and the user is forced write > trivial > wrappers for every imported function to convert the ADT to the relevant C > type > and back. > > This is both tedious to write and costly in terms of code produced, in > case of > c2hs one calls "toEnum . fromIntegral" and "fromIntegral . fromEnum" for > every > argument/result even though this could trivially be a no-op. > > Worse, since c2hs uses the Enum class for it's conversion to/from C types > it > generates Enum instances like: > > instance Enum Foo where > fromEnum Bar = 1 > fromEnum Baz = 1337 > > toEnum 1 = Bar > toEnum 1337 = Baz > toEnum unmatched = error ("PlatformInfo.toEnum: Cannot match " ++ > show unmatched) > > Since succ/pred and enumFromTo's default implementations assume enums > convert > to continuous sequence of Int this means the default generated enum > instances > crash. This problem could be overcome by making c2hs' code generation > smarter, > but this does not eliminate the tediousness of wrapping all foreign > imported > functions with marshalling wrappers, NOR does it eliminate the overhead of > all > this useless marshalling. > > Proposal > -------- > > Add a new foreign construct for enums, the syntax I propose below is rather > ugly and ambiguous and thereforeopen to bikeshedding, but I prefer > explaining > based on a concrete example. > > foreign enum CInt as Foo where > Bar = 1 > Baz > Quux = 1337 > Xyzzy = _ > > This would introduce a new type 'Foo' with semantics approximately > equivalent > too "newtype Foo = Foo CInt" plus the pattern synonyms "pattern Bar = Foo > 1; > pattern Baz = 2; pattern Quux = 1337; pattern Xyzzy = Foo _". > > Explicit listing of the value corresponding to a constructor should be > optional, missing values should just increment by one from the previous > (like > C), if the initial value is missing, it should assume to start from 0. > Values > do not need to be contiguous. > > Users should be able to use these constructors as normal in pattern match > (really, this mostly follows to semantics of the above pattern synonyms). > > The foreign import/export functionality should invisibly marshall Foo to > the > underlying foreign type (as is done for newtypes). > > I'm unsure about the support for a wildcard constructor like Xyzzy. If > there is > support for a wildcard, it should be optional. On the upside a wildcard > means > the marshalling is no longer a partial function. The downside is that it > makes > desugaring the use of enums in patterns harder. It seems clear that > > f Xyzzy = {- ... -} > f Bar = {- ... -} > f Baz = {- ... -} > f Quux = {- ... -} > > Should not have the same semantics as: > > f (Foo _) = {- ... -} > f (Foo 1) = {- ... -} > f (Foo 2) = {- ... -} > f (Foo 1337) = {- ... -} > > So in the presence of wildcards, the Foo enum can't trivially be desugared > into > pattern synonyms after checking exhaustiveness. > > Pros: > 1. Foreign imports are slightly more type safe, as one can now write: > > foreign import ccall "someFoo.h" someFoo :: Foo -> Ptr () -> IO () > > Preventing users from passing an arbitrary CInt to an argument > expecting a > specific enum. > > 2. No need to write marshalling functions to/from ADT to obtain > exhaustiveness > checks and pattern matching > > 3. Cheaper as marshalling Foo to CInt is a no-op > > 4. toEnum/fromEnum can simply map to contiguous sequence of Int as this > Int > mapping is no longer used for marshalling > > Cons: > 1. Non-standard extension of the FFI > > 2. Someone has to implement it > > 3. Wildcards constructors would present difficulties desugaring pattern > matches to a simple newtype. > > 4. ?? > > What Would Need to be Done? > --------------------------- > > 1. Parser needs to be extended to deal with parsing of enum declarations. > 2. Pattern matches of an enum type need to be checked for exhaustiveness > and > desugared to the underlying type's representation. > 3. Extend foreign imports/exports to marshall enums properly. > > If there's no objections I'm willing to take a stab at implementing this, > although I'd probably need some help with GHC's internals (although I > could bug > #ghc for that). > > Cheers, > Merijn > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://www.haskell.org/mailman/listinfo/ghc-devs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: