Using associated data types to create unpacked data structures
Simon Peyton-Jones
simonpj at microsoft.com
Thu Aug 12 12:07:37 EDT 2010
You don't want to go overboard here.
1. You *want* a distinct blob of lookup code for each different key type, because you really do want a different lookup structure for each
2. For the most part you *dont want* a different blob of lookup code for each value type, because almost all of them are represented uniformly by a pointer.
So it's be silly to generate all possible combinations of types. The exception to (2) is that you want different code for a handful of types that you want to unbox into the tree structure itself: Int#, Float# etc. It would be good to design a convenient way to do that. It's nothing directly to do with associated types. We'd like to allow Maybe Int#, say, but we don't at the moment because that data structure would really be represented differently. Some kind of data type and code cloning (a la C++) is probably the right thing. This is what Max meant by "just engineering" but it would require careful thought and design.
Simon
From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-haskell-users-bounces at haskell.org] On Behalf Of Johan Tibell
Sent: 12 August 2010 16:56
To: Simon Marlow
Cc: glasgow-haskell-users
Subject: Re: Using associated data types to create unpacked data structures
On Thu, Aug 12, 2010 at 11:28 AM, Simon Marlow <marlowsd at gmail.com<mailto:marlowsd at gmail.com>> wrote:
Rather than try to solve this problem in one go, I would go for a low-tech approach for now: write a TH library to generate the code, and ask the user to declare the versions they need. To make a particular version, the user would say something like
module MapIntDouble (module MapIntDouble) where
import TibbeMagicMapGenerator
make_me_a_map ...
there's no type class of course, so you can't write functions that work over all specialised Maps. But this at least lets you generate optimised maps for only a little boilerplate, and get the performance boost you were after.
To get a better idea of how many specialized maps the user would have to create under this scheme I ran an analysis of the Chromium codebase [1]. The Chromium codebase is not small but some companies have codebases which are several order of magnitudes larger, which makes the results below more of a lower bound than an upper bound on the number of specialized maps one might need in a program.
$ git clone http://src.chromium.org/git/chromium.git
Initialized empty Git repository in /tmp/chromium/.git/
remote: Counting objects: 548595, done.
remote: Compressing objects: 100% (167063/167063), done.
remote: Total 548595 (delta 401993), reused 477011 (delta 343049)
Receiving objects: 100% (548595/548595), 1.02 GiB | 24.44 MiB/s, done.
Resolving deltas: 100% (401993/401993), done.
$ cd chromium
$ find . -name \*.h -o -name \*.cc -exec egrep -o "map<[^,]+, ?[^>]+>" {} \; | sort -u | wc -l
220
$ find . -name \*.h -o -name \*.cc -exec w -l {} \; | awk '{tot=tot+$1} END {print tot}'
81328
So in a code base of about 80 KLOC there are 220 unique key/value combinations. While the numbers might not translate exactly to Haskell it still indicates that the number of modules a user would have to create (and put somewhere in the source tree) would be quite large.
1. http://src.chromium.org/git/chromium.git
Cheers,
Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100812/ac807728/attachment.html
More information about the Glasgow-haskell-users
mailing list