Proposal for Data.List.splitBy

Fri Feb 6 11:20:55 EST 2009

Marcus D. Gabriel wrote:
> If I were to write
> 
> organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])]

I quite like your idea, but I think the input predicate
"([a] -> Bool)" is too ambitious, although it would nicely unify
Brent's

data Delimiter a where
  DelimEltPred :: (a -> Bool) -> Delimiter a
  DelimSublist :: Eq a => [a] -> Delimiter a

With the predicate ([a] -> Bool) you have to check all inits of your
input to detect a delimiter and only if all inits are no delimiter you
know the head element is not part of a delimiter and repeat checking the
inits of the tail. I think this is too inefficient in general.

In order to keep things simple I would vote for a split function that
only takes the simple predicate (a -> Bool) and leaves the more
complicated splitting to _additional_ functions
(following http://haskell.org/haskellwiki/Simple_to_complex)

For instance

  replace :: [a] -> a -> [a] -> [a]

could replace a sublist (first argument) with a single element (second
argument)

This would help in simple cases, but leaves it to the user to choose a
suitable delimiter element.

But a general splitting on sublists could be implemented via

  splitBy isNothing (replace (map Just sl) Nothing (map Just l))

(For a fixed sublist sl, "Nothing" is enough to represent the delimiter)

For a simple predicate "(a -> Bool)" it remains to discuss the output of
splitBy.

You've proposed "[([a], [a])]", but for the simpler case
"[(a, [a])]" or "[([a], a)]" may do, but in order to capture the full
information of lists before and after delimiters, something like
"([a], [(a, [a])])" is needed.

Since a tuple "(a, [a])" can be viewed as non-empty list of type "[a]",
"([a], [(a, [a])])" collapses to a non-empty list of type "[[a]]" with a
_tail_ of non-empty element lists. Therefore I propose the following
splitBy as a "work horse".

  splitBy :: (a -> Bool) -> [a] -> [[a]]

The implementation is simply using "break". The basic property is:

  concat (splitBy p l) == l

(This is almost in the spirit of Data.List, since groupBy or words also
produces non-empty element lists.)

Getting rid of a final delimiter or all of them can be implemented in
O(n) (see below).

And finally:

  wordsBy p = filter (not . null) . dropDelims . splitBy p
  linesBy p = dropDelims . dropFinalDelim . splitBy p

with:

  words s == wordsBy isSpace s
  lines s == linesBy (== '\n') s

Surely, one might prefer a more type-safe version wrt non-empty lists,
but the main idea is to use function composition rather than composing a
(non-haskell98) splitter input data type as Brent does in his split package.

Cheers Christian

import Data.Char (isSpace)

splitBy :: (a -> Bool) -> [a] -> [[a]]
splitBy p l = let (fr, rt) = break p l in case rt of
  [] -> [fr]
  d : tl -> let hd : tll = splitBy p tl in fr : (d : hd) : tll

dropFinalDelim :: [[a]] -> [[a]]
dropFinalDelim ll@(l : ls) =
  if null ls then
      if null l then [] else ll
  else
      if null (tail (last ls)) then init ll else ll

dropDelims :: [[a]] -> [[a]]
dropDelims ll = case ll of
  [] -> []
  l : ls -> l : map tail ls

wordsBy :: (a -> Bool) -> [a] -> [[a]]
wordsBy p = filter (not . null) . dropDelims . splitBy p

linesBy :: (a -> Bool) -> [a] -> [[a]]
linesBy p = dropDelims . dropFinalDelim . splitBy p

prop_wordsBy :: String -> Bool
prop_wordsBy s = words s == wordsBy isSpace s

prop_linesBy :: String -> Bool
prop_linesBy s = lines s == linesBy (== '\n') s

replace :: Eq a => [a] -> a -> [a] -> [a]
replace sl@(_ : _) r l = case l of
  [] -> l
  x : xs -> case stripPrefix sl l of
    Nothing -> x : replace sl r xs
    Just rt -> r : replace sl r rt

subListSplit :: Eq a => [a] -> [a] -> [[Maybe a]]
subListSplit sl@(_ : _) l =
  splitBy isNothing (replace (map Just sl) Nothing (map Just l))

unintercalate :: Eq a => [a] -> [a] -> [[a]]
unintercalate sl@(_ : _) =
  map (map fromJust) . dropDelims . subListSplit sl