[Haskell-cafe] Re: Where is Data.Atom?
Brian Hulley
brianh at metamilk.com
Tue Jul 4 18:37:13 EDT 2006
Iain Alexander wrote:
> Another suggestion:
>
> Put your strings in an ordered binary tree (other data structures
> might also work here).
>
> Make your Atom an encoding of the structure of the tree (resp. other
> structure). This is logically a sequence of bits, 0 for left (less
> than), 1 for right (greater than) - if you terminate this
> variable-length sequence with 1, I think it all works out OK. You
> then encode this sequence of bits in a Word32 (or some other
> convenient item), with implicit trailing zeros.
>
> root ~> 1[000...]
> root.left ~> 01[000...]
> root.right ~> 11[000...]
> ...
>
> There are clearly some questions such as how unbalanced is the tree
> likely to get (since you can't conveniently rebalance it), and what
> to do if your Atom "overflows", but this might give you something to
> work with.
Thanks for the suggestion. However I think that just using a binary tree
would be equivalent to just using the representation of the original string
as a sequence of bits (except for "early out" eg representing 0b00001111 as
just 1111 etc), and any other systematic encoding of strings that preserved
all the info necessary for lexicographic ordering (that still works when new
atoms are created later) would also end up using at least the same number of
bit comparisons on average (since a new information preserving
representation is just a permutation of the original at the bit level - if
some strings are represented with fewer bits others would need more bits so
it would all balance out).
The only way I can see of getting truly O(1) lexicographic comparisons (by
this I mean O(log P) where P is the maximum number of atoms that can exist
simultaneously at any point during program execution) would be to have a
"floating" representation (accessed via an IORef) where the concrete values
of existing atoms are changed as necessary behind the scenes as new atoms
are created, but as Chris pointed out this would require a locking operation
if the atoms were used in a multi-threaded program, so all in all, it may
well be much faster to just use ByteStrings especially if the Atoms are just
to be used to represent fairly short strings such as module names or natural
language words etc.
Still I think this would be an interesting project - Prolog style Atoms with
O(1) lexicographic ordering and O(n + f k) creation time where k is the size
of the existing population and f is some function which minimizes the
amortized time...
Regards, Brian.
--
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.
http://www.metamilk.com
More information about the Haskell-Cafe
mailing list