Proposal for generalized function partition in List-library

John Meacham john@repetae.net
Thu, 17 May 2001 11:42:50 -0700


this is just how I understand things at the moment, if I am wrong
or misleading anywhere then please speak up. 

I am unconvinced that such generalizations must come as speed hits, but
perhaps someone can enlighten me. compilers seem to support seperate
compilation and polymorphism at the moment by boxing polymorphic types
and passing their class-context (term?) as a hidden parameter to the
polymorphic function. but it seems that since haskell is strongly type
checked at compile time, we already KNOW all of the types with certainty
when it comes to compiling the program in the end so this would seem to
be unnecisarry. 

now there are two cases where this breaks down
 * seperate compilation
 * code bloat

namely that you don't necisarilly know all of the types of polymorphic
functions when they are in a module that is compiled seperately from the
rest of the program. the solution used by many C++ compilers is to
inline all polymorphic code in place (this is why templates are in
header files). there is no reason this won't work in haskell and give
back all of the speed benefits of base types everwhere (as it would in
this example) but has the unfortunate side effect of causing worst case
exponential code bloat as every function is re-written for every type
combination it is used with.

the solution I see (and probably exists) is a hybrid model, one that
lets the compiler know to specialize certain polymorphic functions as
well as let the compiler utilize such specialized functions. a pragma
which lets the compiler know that a certain specialization would be
useful such as 

partition :: (Eq b) => (a -> b) -> [a] -> [[a]]
partition fn ls = ...

{-# specialize partition :: (a -> Bool) -> [a] [[a]] -}

which would let the compiler know to generate code for partition
specificially optimized for (a -> Bool) and thus obviating the need for
the Eq hidden argument. the fully polymorphic version would of course
also have to be generated. specializations would be advertised in the
header file and utilized whenever the typechecker determined you were
using partition in the specialized case. 

this seems like a better solution than the current practice of providing
seperate general and specific functions, take, genTake, max, genMax and
whatnot. you would just specify that specializations for the case of
'Int' should be generated as it would be the most common case.

an {-# inline partition -} might also be useful which would actually
place the text of the function into the .hi file to be in-line expanded
as is done in C++... I imagine there would be some tweaking to determine
when this is a win and when it isn't...

some of this stuff probably exists in current compilers, but
polymorphism need not be at the expense of speed.
	
	-John


On Thu, May 17, 2001 at 12:36:39PM +0200, Michal Gajda wrote:
> On Thu, 17 May 2001, Bernd [iso-8859-2] Holzmüller wrote:
> > This partitioning function builds equivalence classes from the list
> > argument, where each element list within the result list consists of
> > elements that all map to the same value when applying f to it.
> > Thus: partition (`div` 2) [1..5] yields [[1],[2,3],[4,5]]
> > 
> > This is much more general than the existing partitioning function and
> > applicable in many practical cases.
> 
> But here generality comes at the expense of speed I think.

-- 
--------------------------------------------------------------
John Meacham   http://www.ugcs.caltech.edu/~john/
California Institute of Technology, Alum.  john@repetae.net
--------------------------------------------------------------