Proposal for Data.List.splitBy
Christian Maeder
Christian.Maeder at dfki.de
Fri Feb 6 11:20:55 EST 2009
Marcus D. Gabriel wrote:
> If I were to write
>
> organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])]
I quite like your idea, but I think the input predicate
"([a] -> Bool)" is too ambitious, although it would nicely unify
Brent's
data Delimiter a where
DelimEltPred :: (a -> Bool) -> Delimiter a
DelimSublist :: Eq a => [a] -> Delimiter a
With the predicate ([a] -> Bool) you have to check all inits of your
input to detect a delimiter and only if all inits are no delimiter you
know the head element is not part of a delimiter and repeat checking the
inits of the tail. I think this is too inefficient in general.
In order to keep things simple I would vote for a split function that
only takes the simple predicate (a -> Bool) and leaves the more
complicated splitting to _additional_ functions
(following http://haskell.org/haskellwiki/Simple_to_complex)
For instance
replace :: [a] -> a -> [a] -> [a]
could replace a sublist (first argument) with a single element (second
argument)
This would help in simple cases, but leaves it to the user to choose a
suitable delimiter element.
But a general splitting on sublists could be implemented via
splitBy isNothing (replace (map Just sl) Nothing (map Just l))
(For a fixed sublist sl, "Nothing" is enough to represent the delimiter)
For a simple predicate "(a -> Bool)" it remains to discuss the output of
splitBy.
You've proposed "[([a], [a])]", but for the simpler case
"[(a, [a])]" or "[([a], a)]" may do, but in order to capture the full
information of lists before and after delimiters, something like
"([a], [(a, [a])])" is needed.
Since a tuple "(a, [a])" can be viewed as non-empty list of type "[a]",
"([a], [(a, [a])])" collapses to a non-empty list of type "[[a]]" with a
_tail_ of non-empty element lists. Therefore I propose the following
splitBy as a "work horse".
splitBy :: (a -> Bool) -> [a] -> [[a]]
The implementation is simply using "break". The basic property is:
concat (splitBy p l) == l
(This is almost in the spirit of Data.List, since groupBy or words also
produces non-empty element lists.)
Getting rid of a final delimiter or all of them can be implemented in
O(n) (see below).
And finally:
wordsBy p = filter (not . null) . dropDelims . splitBy p
linesBy p = dropDelims . dropFinalDelim . splitBy p
with:
words s == wordsBy isSpace s
lines s == linesBy (== '\n') s
Surely, one might prefer a more type-safe version wrt non-empty lists,
but the main idea is to use function composition rather than composing a
(non-haskell98) splitter input data type as Brent does in his split package.
Cheers Christian
import Data.Char (isSpace)
splitBy :: (a -> Bool) -> [a] -> [[a]]
splitBy p l = let (fr, rt) = break p l in case rt of
[] -> [fr]
d : tl -> let hd : tll = splitBy p tl in fr : (d : hd) : tll
dropFinalDelim :: [[a]] -> [[a]]
dropFinalDelim ll@(l : ls) =
if null ls then
if null l then [] else ll
else
if null (tail (last ls)) then init ll else ll
dropDelims :: [[a]] -> [[a]]
dropDelims ll = case ll of
[] -> []
l : ls -> l : map tail ls
wordsBy :: (a -> Bool) -> [a] -> [[a]]
wordsBy p = filter (not . null) . dropDelims . splitBy p
linesBy :: (a -> Bool) -> [a] -> [[a]]
linesBy p = dropDelims . dropFinalDelim . splitBy p
prop_wordsBy :: String -> Bool
prop_wordsBy s = words s == wordsBy isSpace s
prop_linesBy :: String -> Bool
prop_linesBy s = lines s == linesBy (== '\n') s
replace :: Eq a => [a] -> a -> [a] -> [a]
replace sl@(_ : _) r l = case l of
[] -> l
x : xs -> case stripPrefix sl l of
Nothing -> x : replace sl r xs
Just rt -> r : replace sl r rt
subListSplit :: Eq a => [a] -> [a] -> [[Maybe a]]
subListSplit sl@(_ : _) l =
splitBy isNothing (replace (map Just sl) Nothing (map Just l))
unintercalate :: Eq a => [a] -> [a] -> [[a]]
unintercalate sl@(_ : _) =
map (map fromJust) . dropDelims . subListSplit sl
More information about the Libraries
mailing list