[Haskell-cafe] How to improve its performance ?

Wed Mar 17 21:14:25 EDT 2010

Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:
> import Data.List
>
> combination :: [a] -> [[a]]
> combination [] =  [[]]
> combination (x:xs) =  (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as 
representing a set), I think. And, apart from the order in which they are 
generated, it's the same as Data.List.subsequences (only less efficient).

>
> samp = [1..100]
> allTwoGroup = [(x, samp\\x) | x <- combination samp]
>
> The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

> ? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual).
There are two points where you waste time.
First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists 
come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys).
Also, (\\) requires an Eq constraint. If you're willing to constrain the 
type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered 
lists, you can replace (\\) by the much faster difference of oredered lists 
(implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two 
complementary sublists at the same time.
The idea is,
-- An empty list has one partition into two complementary sublists:
partitions2 [] = [([],[])]
-- For a nonempty list (x:xs), the partitions into two complementary
-- sublists each have x either in the first sublist or in the second.
-- Each partition induces a corresponding partition of the tail, xs,
-- by removing x from the group in which it appears.
-- Conversely, every partition ox xs gives rise to two partitions
-- of (x:xs), by adding x to either the first or the second sublist. So
partitions2 (x:xs) 
    = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

>
> Sincerely!
>