[Haskell-beginners] Combinations of choosing n items from a list

Mon Nov 14 01:53:52 CET 2011

Thanks for the detailed response! I'll be digesting it properly over
the next few days.

Peter

On Mon, Nov 14, 2011 at 12:13 AM, Daniel Fischer
<daniel.is.fischer at googlemail.com> wrote:
> On Sunday 13 November 2011, 23:42:25, Peter Hall wrote:
>> Hi all,
>> This function took me a long time to write, getting my head around the
>> double recursion. It returns a list of all possible sub-sets of a
>> given length from a list.
>>
>> I have a couple of questions.
>>
>> 1. Can it be improved for efficiency or style?
>> 2. Is there a library that already has this and other related
>> functions? I assumed there would be but I couldn't find it on hoogle.
>>
>>
>> combinations :: Int -> [a] -> [[a]]
>> combinations _ [] = []
>> combinations 0 _ = []
>
> That should be
>
> combinations 0 _ = [[]]
>
> since there is one sublist of length 0 of any list.
> And that equation should come before
>
> combinations _ [] = []
>
>> combinations 1 x = map (:[]) x
>
> This equation is then superfluous.
>
>> combinations n (x:xs) = (map (x:) $ combinations (n-1) xs) ++
>> combinations n xs
>
> Some would prefer a list-comprehension here,
>
> combinations n (x:xs)
>    = [x:comb | comb <- combinations (n-1) xs]
>       ++ combinations n xs
>
> (indentation to prevent wrapping in mail-clients),
> but I would leave the decision to personal preference (well, I would use a
> list-comprehension, but I've no pains with the map. However, I submit also
>  map (x:) (combinations (n-1) xs) ++ combinations n xs
> for consideration).
>
> As for efficiency, yes, there's something you can do to dramatically
> improve performance, if you don't want to treat infinite lists (as it is,
> you'd only get sublists with identical (n-1)-long initial segment anyway,
> so support for infinite lists isn't ideal).
>
> Consider for example
>
> combinations 25 [1 .. 25]
> ~> [1:c | c <- ([2:d | d <- combinations 23 [3 .. 25]]
>                             ++ combinations 24 [3 .. 25])]
>         ++ combinations 25 [2 .. 25]
>
> For the second part, you go through all 2^24 sublists of [2 .. 25], only to
> find that all of them are too short. In the second part of the first part,
> you go through all 2^23 sublists of [3 .. 25] to find all of them too
> short.
> Continuing the splitting, you see that
>
> combinations 25 [1 .. 25]
>
> goes through all 2^25 sublists of [1 .. 25], finding all of them where an
> element was discarded anywhere too short.
> A lot of wasted work.
>
> If n > length xs, you know that xs has no sublists of length n, so you can
> abort the calculation immediately. Since calculating the length requires
> traversing the entire list, we'd rather not do that at each step, but carry
> the length as a parameter.
>
> combinations 0 _ = [[]]
> combinations _ [] = []
> combinations n xs
>  | n < 0     = []
>  | n == 1    = map (:[]) xs  -- not needed, but a wee bit more efficient
>  | otherwise = helper n (length xs) xs
>    where
>      helper k l ys@(z:zs)
>        | k < l     = [z:ws | ws <- helper (k-1) (l-1) zs]
>                          ++ helper k (l-1) zs
>        | k == l    = [ys]
>        | otherwise = []   -- can only occur at the start
>
> Not so nice, but avoids a lot of futile work.
> The call to length prohibits infinite lists. You could, at some cost, treat
> infinite lists per
>
> combinations 0 _ = [[]]
> combinations _ [] = []
> combinations n xs@(y:ys)
>  | n < 0     = []
>  | otherwise = case drop (n-1) xs of
>                  [ ] -> []
>                  [_] -> [xs]
>                  _   -> [y:c | c <- combinations (n-1) ys]
>                            ++ combinations n ys
>
> to get the same result as before for infinite lists. The finite case would
> be slower than the previous with the helper and length, but still avoid
> most of the futile work when n is close to length xs.
>
>>
>> e.g.
>>
>> > combinations 3 [1..5]
>>
>> [[1,2,3],[1,2,4],[1,2,5],[1,3,4],[1,3,5],[1,4,5],[2,3,4],[2,3,5],[2,4,5]
>> ,[3,4,5]]
>>
>> Thanks,
>> Peter
>
>