export toDescList from Data.Map

Scott Dillard sedillard at gmail.com
Wed Sep 24 12:50:02 EDT 2008


> Date: Tue, 23 Sep 2008 09:09:44 -0700
> From: "Evan Laforge" <qdunkan at gmail.com>
> Subject: Re: export toDescList from Data.Map
>
>
> > Actually, Data.Map, Data.IntMap, Data.Set and Data.IntSet do not have an
> > active maintainer and maybe everybody waits for "GSoC thing for tries".
> >
> > Indeed, as you noted yourself "it would be nice to fix up Map, Set,
> > IntMap, and IntSet". I think it is almost mandatory. So would you do
> > this, too? "toDescList" makes sense for sets, too. On the other hand, I
> > was against toAscList as it is identical to toList. That means I always
> > assumed a bias towards "ascending operations".
>
> I wouldn't mind doing some cleanup...


I use Map, Set, IntMap and IntSet very much, and I'm very interested in
seeing to it that these important libraries are maintained and improved.
However, short of a major API change, like what's going on with generalized
tries and what's already been done with Edison, I don't think there is much
work left to be done with these libraries at the API level. Just minor
changes like the one you've proposed. There are some things that could be
done though...

 - I'm not sure what Benedikt Huber meant by 'view', but I think he means
exposing the tree structure, in a read-only way, to users. I think its a
shame that the Map/Set libraries do not expose this. The simplest solution
would be to have toTree functions that convert it to a Data.Tree, but I
don't think anybody actually likes that data type. A specialized binary tree
data type would be more elegant, but there is an issue of how data is stored
in the tree. Map/Set store keys and values in internal nodes and use null
leaves. IntMap/IntSet store all keys/values in non-null leaves. So maybe
this TreeView type would have to be specific to either Map or IntMap. The
idea here is that the algorithms and data structures used for these trees
are well-known. The papers are linked from the documentation. So the library
should expose this to users, in a safe way.

 - Since someone raised the issue of test suites for performance and
correctness, I think it would also be interesting to investigate what effect
the strictness annotations in the tree constructors have on performance.
Everyone takes for granted that bangs=faster, but I've noticed, as have
others, that removing these strictness annotations actually make things run
faster. Instead, the tree construction _functions_ should be made strict
using seq. The Map construction functions are already strict enough on
account of the balancing. Try it for yourself, remove the bangs from the
sub-tree fields in the Bin constructor, and run a little benchmark. This
would open up the possibility for lazy versions of functions like mapWithKey
and mapKeysMonotonic. I had mentioned this previously, but few seemed to
care, so I just made the change locally.
http://www.haskell.org/pipermail/libraries/2008-August/010371.html . If we
we're going to start a little Map/Set/IntMap/IntSet working group, then I'd
like to throw that idea back into the mix.



> It's also galling that IntMap is hardcoded to 32bit ints and writing a
> 64 bit version would seem to require yet another copy and paste
> session.  And it seems like theoretically patricia should work on
> doubles too.  But I'm not sure how to address all this code sharing.
>

IntMap is hard coded to Int, which is 32 bits on a 32 bit architecture, and
64 bits on a 64 bit architecture. I think the main reason for this is so
that the key can be
unpacked/specialized for extra performance, since that is the primary
purpose of the library. If you just need sublinear insert/lookup/delete then
use a Map.

Are you galled that the key type is not hardcoded to Int64, or that the key
is not allowed to be any instance of the Bits class? In the former case,
you'd inflate the size of the tree and take a small speed hit on 32bit
archs, and in the latter I think you'd have even worse performance. I think
the choice of Int is wise because it is the fastest, and that is the point
of the library, and if you think about it you're not going to be dealing
with more that 2^32 Ints on a 32 bit computer. (What would they be
referencing? Not array positions or file offsets.)  If you're trying to
optimize, say, "Map (Int32,Int32) a" into "IntMap64 a", there are other ways
to accomplish that:

import qualified Data.IntMap as IM
import Data.IntMap (IntMap)

newtype IntTrie = IntTrie (IntMap IntTrie)

empty :: IntTrie
empty = IntTrie IM.empty

insert :: [Int] -> IntTrie -> IntTrie
insert []     t           = t
insert (x:xs) (IntTrie t) = IntTrie t'
  where
  (y,t') = IM.insertLookupWithKey (const union) x y' t
  y' = case y of Just y  -> insert xs y; Nothing -> insert xs empty

delete :: [Int] -> IntTrie -> IntTrie
delete []     t           = empty
delete (x:xs) (IntTrie t) = IntTrie (IM.update f x t)
  where
  f y = case IM.delete x (case delete xs y of IntTrie t -> t) of
          t' | IM.null t' -> Nothing
             | otherwise  -> Just (IntTrie t')

And so forth. This is an IntSet for arbitrarily long integers represented as
[Int]. An IntMap is probably not much more work.


> > Maybe this explains the little feedback to this ticket, but I would like
> > to encourge you, to improve things, anyway.
>
> Well thanks.  It's nice to hear *something* :)
>
>
For my $0.02 your proposal is good. toDescList is very important to have in
many situations. I've had it un-hidden locally for a while now. It's silly
that it wasn't exported in the first place.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20080924/6a5106a7/attachment.htm


More information about the Libraries mailing list