[Haskell-cafe] Where is Data.Atom ?
Brian Hulley
brianh at metamilk.com
Sun Jul 2 18:34:03 EDT 2006
Brian Hulley wrote:
> So perhaps my original spec is impossible to implement, though it is
> an open question whether some very clever encoding (with corresponding
> implementation of <) could be found which would lead to a better
> average performance (whatever that means).
>
> An alternative design for an atom module could be:
>
> create :: MonadIO m => String -> m Atom
> toString :: Atom -> String
>
> instance Eq Atom -- O(1)
> instance Ord Atom -- O(1) but depends on creation order
>
> but here the < would not be lexicographic, so although it would be
> useful for implementing symbol tables, environments etc it's not
> ideal for GUI use (eg when displaying a tree of modules where
> everything should be listed alphabetically).
Eureka! There *is* a way to get O(1) lexicographic comparisons of Atoms :-)
although at the expense of O(n + k) but possibly amortized O(n + ?) creation
time where n is the length of the string and k is the number of Atoms in
existence at any given time. The previous reason for needing to be
clairvoyant in the choice of Unique (or Integer or Int) to represent an Atom
was to allow expressions such as
atom "a" < atom "b"
because when the code has determined the representation for the first atom
the creation of the second atom mustn't be allowed to change it.
However with monadic creation, such expressions can't be written, and
therefore it would be possible to represent atoms as follows:
newtype Atom = Atom (IORef Int) -- or IORef Integer
and when new atoms are created, we could simply update the Int's referenced
by other atoms to ensure that the relative ordering is still lexicographic.
If we assume that it's very unlikely we'd need more than 2^31 atoms to exist
at any given time (if this assumption is wrong we could use Integer instead
of Int) and that usually we'd have a lot less, the Int's could be allocated
with gaps between them so that we'd only occasionally need to adjust the
Int's of the atoms to the left or right when a new atom is inserted in the
table, and in all cases it's extremely unlikely that we'd need to adjust the
representation of all the other Atoms (hence amortized O(n) hopefully).
The comparison would then be:
compare (Atom l) (Atom r) = unsafePerformIO $ do
li <- readIORef l
ri <- readIORef r
return (compare li ri)
which is safe as long as creation of atoms is not allowed inside
unsafePerformIO (it would be nice if there was a way to tell the typechecker
that a specific action is not allowed in unsafe IO)
Still there would be some tricky details to work out eg how to ensure that
the average number of Integer's that need to be updated on creation of each
new atom is as small as possible, so I'll leave this as an exercise for the
reader!!! :-)
Essentially it depends on how well the following can be implemented:
data OrderingToken = ???
instance Eq OrderingToken
instance Ord OrderingToken
create :: MonadIO m => m OrderingToken
-- x < y |- createBetween x y == z where x < z and z < y
createBetween :: MonadIO m => OrderingToken -> OrderingToken -> m
(OrderingToken)
It would be interesting to investigate what the theoretical tradeoffs would
be between the complexities of the above functions (assuming we want (<) to
be O(1)) and whether or not they require to be monadic.
Anyway apologies for rambling on,
Brian.
--
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.
http://www.metamilk.com
More information about the Haskell-Cafe
mailing list