Proposal for Data.List.splitBy
Marcus D. Gabriel
marcus at gabriel.name
Mon Feb 9 16:14:16 EST 2009
Thanks for the clarification. I would not have guessed wordsBy
and linesBy as your preference over splitBy for additions to
Data.List.
I do not have an opinion about whether or not functional composition
(pipelining) is more elegant than Brent's Splitter data type. I just
know that I am very comfortable with constructs such as (f . g . h)
or longer.
By the way, out or curiosity, how would you generalize your current
function replace?
Cheers,
- Marcus
Christian Maeder wrote:
> Marcus D. Gabriel wrote:
>
>> Christian Maeder wrote:
>>
> [...]
>
>>> splitOnAList :: Eq a => [a] -> [a] -> <To Be Decided>
>>>
>> to split on a list such as "\r\n", then you can use isPrefixOf
>> whereupon the performance is good enough (actually, its not bad
>> at all).
>>
>
> The special case for "\r\n" is actually trivial, because "\r" can simply
> be filtered out first.
>
>
>>> But a general splitting on sublists could be implemented via
>>>
>>> splitBy isNothing (replace (map Just sl) Nothing (map Just l))
>>>
>>> (For a fixed sublist sl, "Nothing" is enough to represent the delimiter)
>>>
>> Nice. It took me a moment, but nice.
>>
>
> Good, see below my definition of subListSplit and unintercalate.
>
>
>>> For a simple predicate "(a -> Bool)" it remains to discuss the output of
>>> splitBy.
>>>
>>> You've proposed "[([a], [a])]", but for the simpler case
>>> "[(a, [a])]" or "[([a], a)]" may do,
>>>
>> Actually, although enticing, I do not believe that [(a, [a])] is
>> possible due to the corner cases when there is no beginning
>> non-delimiter or ending delimiter, that is, one needs
>> [(Maybe a, [a])]. (Please check me on this.)
>>
>
> You're right here, there are several ways to accommodate all delimiters
> and non-delimiters. I've done it as below.
>
>
>>> but in order to capture the full
>>> information of lists before and after delimiters, something like
>>> "([a], [(a, [a])])" is needed.
>>>
>>> Since a tuple "(a, [a])" can be viewed as non-empty list of type "[a]",
>>> "([a], [(a, [a])])" collapses to a non-empty list of type "[[a]]" with a
>>> _tail_ of non-empty element lists.
>>>
>> I unfortunately do not follow you here, sorry. Be that as it may,
>> I have come to appreciate the output [[a]].
>>
>
> In "([a], [(a, [a])])" the first component takes the (possibly empty)
> part before the first delimiter. The second part of type [(a, [a])] are
> the remaining delimiter with (possibly empty or longer) non-delimiter
> pairs. Such pairs are then viewed as non-empty lists. After changing the
> second component to [[a]] the resulting pair ([a], [[a]]) is then also
> changed to a non-empty list of lists [[a]].
>
>
>>> wordsBy p = filter (not . null) . dropDelims . splitBy p
>>> linesBy p = dropDelims . dropFinalDelim . splitBy p
>>>
>
> I'ld like to improve linesBy as follows:
>
> linesBy p = dropFinalNil . dropDelims . splitBy p
>
> (dropFinalNil is simpler than dropFinalDelim and dropDelims can assume a
> non-empty list from splitBy.)
>
>
>> So, in summary, your idea would be to introduce two functions into
>> Data.List:
>>
>>
>>> splitBy :: (a -> Bool) -> [a] -> [[a]]
>>> replace :: Eq a => [a] -> a -> [a] -> [a]
>>>
>> Is this correct?
>>
>
> Personally, I'ld be content with wordsBy only, but adding linesBy,
> splitBy and the combination of "dropDelims . splitBy p" (under some
> suitable names) would make sense for me with or without replace. (In
> fact, replace should be generalized further.)
>
>
>> If so, how would you define
>>
>>
>>> splitOnAList :: Eq a => [a] -> [a] -> [[a]]
>>>
>> using splitBy and replace. For example,
>>
>>
>>> splitOnAList "\r\n" "abc\r\nxyz\r\n" == ["abc","\r\n","xyz","\r\n"]
>>>
>
> Again, for "\r\n" this is simply:
>
> concatMap (: ["\r\n"]) . linesBy (== '\n') . filter (/= '\r')
>
> For the general case take my unintercalate below:
>
> splitOnAList sl = intercalate [sl] . map (: []) . unintercalate sl
>
> This code only puts back identical delimiters that nobody needs because
> the delimiter is fixed and known via the input argument.
>
> For sublist matching, keeping delimiters is unnecessary in general!
>
>
>> reasonably handle the task. Actually, can even Data.List.Split
>> reasonably handle the task? (I only just recalled this common
>> little problem that is almost trivial but never really so.)
>>
>
> Data.List.Split can handle all these tasks (and more), too, only less
> elegant (I think).
>
>
>>> import Data.Char (isSpace)
>>>
>>> splitBy :: (a -> Bool) -> [a] -> [[a]]
>>> splitBy p l = let (fr, rt) = break p l in case rt of
>>> [] -> [fr]
>>> d : tl -> let hd : tll = splitBy p tl in fr : (d : hd) : tll
>>>
>>> dropFinalDelim :: [[a]] -> [[a]]
>>>
>
> forget dropFinalDelim one and take:
>
> dropFinalNil :: [[a]] -> [[a]]
> dropFinalNil ll@(_ : _) =
> if null (last ll) then init ll else ll
>
>
>>> dropDelims :: [[a]] -> [[a]]
>>>
>
> dropDelims does no longer need to work for empty inputs:
>
>
>>> dropDelims ll = case ll of
>>> [] -> []
>>> l : ls -> l : map tail ls
>>>
>
> dropDelims (l : ls) = l : map tail ls
>
> but a total variant makes also sense:
>
> dropDelims ll = let (ft, rt) = splitAt 1 ll in
> ft ++ map (drop 1) rt
>
>
>>> wordsBy :: (a -> Bool) -> [a] -> [[a]]
>>> wordsBy p = filter (not . null) . dropDelims . splitBy p
>>>
>>> linesBy :: (a -> Bool) -> [a] -> [[a]]
>>>
>
> change linesBy to:
>
> linesBy p = dropFinalNil . dropDelims . splitBy p
>
>
>>> replace :: Eq a => [a] -> a -> [a] -> [a]
>>> replace sl@(_ : _) r l = case l of
>>> [] -> l
>>> x : xs -> case stripPrefix sl l of
>>> Nothing -> x : replace sl r xs
>>> Just rt -> r : replace sl r rt
>>>
>>> subListSplit :: Eq a => [a] -> [a] -> [[Maybe a]]
>>> subListSplit sl@(_ : _) l =
>>> splitBy isNothing (replace (map Just sl) Nothing (map Just l))
>>>
>>> unintercalate :: Eq a => [a] -> [a] -> [[a]]
>>> unintercalate sl@(_ : _) =
>>> map (map fromJust) . dropDelims . subListSplit sl
>>>
>
> unintercalate can also be simplified to:
>
> unintercalate sl@(_ : _) = map catMaybes . subListSplit sl
>
> Cheers Christian
>
>
>
--
Marcus D. Gabriel, Ph.D. Saint Louis, FRANCE
http://www.marcus.gabriel.name mailto:marcus at gabriel.name
Tel: +33.3.89.69.05.06 Portable: +33.6.34.56.07.75
More information about the Libraries
mailing list