Proposal for Data.List.splitBy

Sun Jan 18 06:02:43 EST 2009

Brent Yorgey wrote:
>> P2. There should be no information loss, that is, keep the delimiters,
>> keep the separators, keep the parts of the original list xs that satisfy
>> a predicate p, do not lose information about the beginning and the end
>> of the list relative to the first and last elements of the list
>> respectively. The user of the function decides what to discard.
>>
>> P3. A split list should be unsplittable so as to recover the original
>> list xs. (I made up the word unsplittable.) (P2 implies P3, but let us
>> state this anyway.)
>
> I'm not sure I agree with this.

Thanks for stating this.  Dropping P3 would change my
thinking about this topic, that is, if we drop P3, then
I would prefer that no splitter functions are added to
Data.List and that it is left as is.

> The problem is that much (most?) of
> the time, people looking for a split function want to discard
> delimiters; for example, if you have a string like "foo;bar;baz" and
> you want to split it into ["foo","bar","baz"].

I agree with this comment when thinking about strings and what
I would do most of the time and from a pragmatic point of view.

> In this case it's
> really annoying to have to throw away the delimiters yourself,
> especially if you just get back a list like
> ["foo",";","bar",";","baz"] and have to decide which things are
> delimiters and which aren't,

I certainly understand this point, however,

> with no help from the type system.

-- P5. The splitter functions should fit within the spirit of the
-- Data.List module and even the original Haskell 98 List module
-- in terms of type signature and complexity of implementation.

In my mind, the idea of adding a few splitter functions to
Data.List does not preclude Data.List.Split.  From my perspective
of P5 above, within the spirit of Data.List, you can work with
things like a, [a], (a,b), Maybe a, Eq a, ... .  If you wish to
do more, a separate module would be in order as you have done.

> But,
> as you noted, throwing away information like this is bad from an
> elegance/formal properties point of view. This is exactly why I
> designed the Data.List.Split library as I did: the core internal
> splitting function is information-preserving, and by using various
> combinators the user can choose to throw away whatever information
> they are not interested in.

Perfect.  So, as you see it, are there one, two, or three functions
in or hiding in Data.List.Split.Internals that can be factored and
placed into Data.List that are in line with P1 to P7?  You do not
actually need to agree to P1 to P7, it is a conceptual exercise.
The idea is that Data.List.Split would flow more naturally from
Data.List with these few functions added to it.

Finally, as concrete examples or to clarify points, the words
split, delimiter, separator and variations thereof have been
used.  This already implies a theme.  Do you conceptualize
of Data.List.Split as primarily to help programmers from other
backgrounds to be able to manipulate strings, that is, supply
some nice idioms but generalized from [Char] to [a]?

If I were to write

organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])]

could you think of a specification such that this function
would be a work horse in implementing Data.List.Split.Internals
and Data.List.Split?

Alex had the point of view later in this thread that now that
Data.List.Split exists, anything that we move to Data.List will
be arbitrary in the cutoff.  Duncan responded by advancing the idea
that by examining what is happening in Haskell code, we may find a
few useful functions for Data.List.

My intermediate idea would be to examine Data.List.Split
and Data.List.Split.Internals and think about factoring very
general idioms that could be placed into Data.List, would be the
work horse for implementing Data.List.Split.Internals and
Data.List.Split, and would be in line with P1 to P7, which
I acknowledge is my point of view.  If a few words could be
found that fit the above, they would merit Data.List.

Finally, this whole thread brings up the question in my mind
about module design.  As work is put into Data.List.Split, what
is the guiding principle that prevents it from becoming
Data.List.Extensions or to be a bit more direct,
Data.List.TheFunctionsThatWereForgotten?

At <http://haskell.org/haskellwiki/Data.List.Split>, we have

> An important caveat: we should strive to keep things flexible yet
> SIMPLE. The more complicated things get, the closer this gets to just
> being a general parsing or regex library. So the right balance needs
> to be struck. 

I agree, and we have

> A theoretical module which contains implementations/combinators for
> implementing every possible method of list-splitting known to man.
> This way no one has to argue about what the correct interface for
> split is, we can just have them all. 

Is not this Data.List?  In other words, what idea or theme does
a new Haskell programmer use to decide to first look into Data.List
as opposed to Data.List.Split and vice versa?

Cheers,
- Marcus

-- 
  Marcus D. Gabriel, Ph.D.                         Saint Louis, FRANCE
  http://www.marcus.gabriel.name            mailto:marcus at gabriel.name
  Tel: +33.3.89.69.05.06                   Portable: +33.6.34.56.07.75