[Haskell-cafe] EnumSet and EnumMap
haskell at list.mightyreason.com
Sun Feb 25 16:42:16 EST 2007
Yitzchak Gale wrote:
> Chris Kuklewicz wrote:
>> There is a performance cost. Each use of
>> Data.List.map in that code is a performance
>> cost. And if toEnum/fromEnum is non-trivial
>> there is a performance cost.
> Are you sure? I am under the impression that
> at least for GHC, toEnum/fromEnum on Int is
> free. And every Data.List.map in your code is
> mapping one of toEnum, fromEnum, or a newtype
> constructor, so those should also be free,
> or perhaps could be made free with a pragma.
> Expert opinions on this?
GHC's darcs repostory, at http://darcs.haskell.org/packages/base/GHC/Base.lhs :
> -- | The 'Prelude.toEnum' method restricted to the type 'Data.Char.Char'.
> chr :: Int -> Char
> chr (I# i#) | int2Word# i# `leWord#` int2Word# 0x10FFFF# = C# (chr# i#)
> | otherwise = error "Prelude.chr: bad argument"
> unsafeChr :: Int -> Char
> unsafeChr (I# i#) = C# (chr# i#)
> -- | The 'Prelude.fromEnum' method restricted to the type 'Data.Char.Char'.
> ord :: Char -> Int
> ord (C# c#) = I# (ord# c#)
As you can see, the 'chr' function rightly checks bounds, and this is what is
used in http://darcs.haskell.org/packages/base/GHC/Enum.lhs to make the Enum
Char instance. I made a special CharMap module that uses unsafeChr instead.
An interesting variation would be to define class UnsafeEnum and implement
non-bounds checked unsafeToEnum and then base EnumMap/Set around this instead of
the usual Enum.
Or you could newtype an UnsafeChar with a custom Enum instance that uses unsafeChr.
> On types other than Int, you'll incur a very small
> cost. It should still be much better than generic
> Map and Set. If the difference between that and
> building something by hand for types other than
> Int is important, you can still do that, as now.
>>> The difference in interface been the Int and non-Int
>>> versions of Set and Map forces you to make an
>>> early decision about which to use. That decision
>>> gets pervasively hard-wired into your code.
>> This is true. The solution for real application is to
>> adopt a type class interface to collections, for which
>> a few examples exist.
> That is a far-reaching change. I hope it happens some
> In the meantime, your module are usable now, with
> very little change to existing code.
The (EnumMap Char) was not a real gain on (Map Char) until I made CharMap and
added INLINE pragmas and avoided the List.map conversions to/from Char.
> It should certainly be added to the standard
I think the priority should be to add a type class interface, just as we have
IArray and MArray. Many of us have rolled our own String/List type class
interface as well. But this needs to take advantage of MPTC and probably
fundeps or associated types.
> I personally would then stop using
> IntMap/IntSet for new code. If there is any penalty
> at all, it will certainly be insignificant to me.
> (I am not currently writing any regexp libraries.)
Then please enjoy EnumSet and EnumMap. You probably should add an INLINE pragma
for each of the definitions. (I used a sed command to generate them).
> If this is indeed free of performance cost for
> most or all compilers, I would then expect the old
> Int-specific implementations to disappear
> in a few years.
Well, for Int the toEnum/fromEnum is identity, so any compiler that can inline
the calls should remove this performance penalyy.
More information about the Libraries