[Haskell-cafe] EnumSet and EnumMap

Chris Kuklewicz haskell at list.mightyreason.com
Sun Feb 25 16:42:16 EST 2007


Yitzchak Gale wrote:
> Chris Kuklewicz wrote:
>> There is a performance cost.  Each use of
>> Data.List.map in that code is a performance
>> cost.  And if toEnum/fromEnum is non-trivial
>> there is a performance cost.
> 
> Are you sure? I am under the impression that
> at least for GHC, toEnum/fromEnum on Int is
> free. And every Data.List.map in your code is
> mapping one of toEnum, fromEnum, or a newtype
> constructor, so those should also be free,
> or perhaps could be made free with a pragma.
> 
> Expert opinions on this?
> 

GHC's darcs repostory, at http://darcs.haskell.org/packages/base/GHC/Base.lhs :

> -- | The 'Prelude.toEnum' method restricted to the type 'Data.Char.Char'.
> chr :: Int -> Char
> chr (I# i#) | int2Word# i# `leWord#` int2Word# 0x10FFFF# = C# (chr# i#)
>             | otherwise                                  = error "Prelude.chr: bad argument"
> 
> unsafeChr :: Int -> Char
> unsafeChr (I# i#) = C# (chr# i#)
> 
> -- | The 'Prelude.fromEnum' method restricted to the type 'Data.Char.Char'.
> ord :: Char -> Int
> ord (C# c#) = I# (ord# c#)

As you can see, the 'chr' function rightly checks bounds, and this is what is
used in http://darcs.haskell.org/packages/base/GHC/Enum.lhs to make the Enum
Char instance.  I made a special CharMap module that uses unsafeChr instead.

An interesting variation would be to define class UnsafeEnum and implement
non-bounds checked unsafeToEnum and then base EnumMap/Set around this instead of
the usual Enum.

Or you could newtype an UnsafeChar with a custom Enum instance that uses unsafeChr.

> On types other than Int, you'll incur a very small
> cost. It should still be much better than generic
> Map and Set. If the difference between that and
> building something by hand for types other than
> Int is important, you can still do that, as now.

Correct.

> 
>>> The difference in interface been the Int and non-Int
>>> versions of Set and Map forces you to make an
>>> early decision about which to use. That decision
>>> gets pervasively hard-wired into your code.
> 
>> This is true.  The solution for real application is to
>> adopt a type class interface to collections, for which
>> a few examples exist.
> 
> That is a far-reaching change. I hope it happens some
> day.

Perhaps.

> 
> In the meantime, your module are usable now, with
> very little change to existing code.

The (EnumMap Char) was not a real gain on (Map Char) until I made CharMap and
added INLINE pragmas and avoided the List.map conversions to/from Char.

> It should certainly be added to the standard
> libraries.

I think the priority should be to add a type class interface, just as we have
IArray and MArray.  Many of us have rolled our own String/List type class
interface as well.  But this needs to take advantage of MPTC and probably
fundeps or associated types.

> I personally would then stop using
> IntMap/IntSet for new code. If there is any penalty
> at all, it will certainly be insignificant to me.
> (I am not currently writing any regexp libraries.)

Then please enjoy EnumSet and EnumMap.  You probably should add an INLINE pragma
for each of the definitions.  (I used a sed command to generate them).

> If this is indeed free of performance cost for
> most or all compilers, I would then expect the old
> Int-specific implementations to disappear
> in a few years.

Well, for Int the toEnum/fromEnum is identity, so any compiler that can inline
the calls should remove this performance penalyy.

> 
> Regards,
> Yitz



More information about the Libraries mailing list