[Haskell-cafe] Data Structures GSoC
wren ng thornton
wren at freegeek.org
Wed Mar 31 14:28:22 EDT 2010
Nathan Hunter wrote:
> Hello.
>
> I am hoping to take on the Data Structures project proposed two years ago by
> Don Stewart here<http://hackage.haskell.org/trac/summer-of-code/ticket/1549>,
> this summer.
> Before I write up my proposal to Google, I wanted to gauge the reaction of
> the Haskell community to this project.
> Particularly:
>
> -What Data Structures in the current libraries are in most dire need of
> improvement?
> -How necessary do you think a Containers Library revision is?
One thing I've seen come up repeatedly is the issue of presenting a
unified and general interface for Data.Map, Data.IntMap, and related
things like Data.Set, bytestring-trie, pqueue, etc which intended to
mimic their interface. That alone isn't big enough for a GSoC, but it
would be a very nice thing to have. Every few months there's a request
on the libraries@ list to alter, generalize, or reunify the map
interface in some way.
** Just to be clear, I do not mean coming up with a typeclass nor doing
things like the generalized-trie tricks, I just mean a good old
fashioned standard API. **
There are countervailing forces for making a good API. On the one hand
we want functions to do whatever we need, on the other hand we want the
API to be small enough to be usable/memorable. In the bytestring-trie
library I attempted to resolve this conflict by offering a small set of
highly efficient ueber-combinators in the internals module, a medium
sized set of functions for standard use in the main module, and then
pushed most everything else off into a convenience module.
The containers library would do good to follow this sort of design. The
Data.Map and Data.IntMap structures don't provide the necessary
ueber-combinators, which has led to the proliferation of convenience
functions which are more general than the standard use functions but not
general enough to make the interface complete. Also, these generalized
functions are implemented via code duplication rather than having a
single implementation, which has been known to lead to cut&paste bugs
and maintenance issues. Provided the correct ueber-combinators are
chosen, there is no performance benefit for this code duplication either
(so far as I've discovered with bytestring-trie).
Additionally, it'd be nice if some of the guts were made available for
public use (e.g., the bit-twiddling tricks of Data.IntMap so I don't
have to duplicate them in bytestring-trie).
Also it would be nice to develop a cohesive test and benchmarking suite,
which would certainly be a large enough task for GSoC, though perhaps
not fundable.
I would be willing to co-mentor API and algorithm design for cleaning
the cobwebs out of the containers library. I wouldn't have the time for
mentoring the choice of datastructures or benchmarking however.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list