Proposal for Data.List.splitBy

Mon Jan 19 13:41:19 EST 2009

On Sun, Jan 18, 2009 at 12:02:43PM +0100, Marcus D. Gabriel wrote:
> > But,
> > as you noted, throwing away information like this is bad from an
> > elegance/formal properties point of view. This is exactly why I
> > designed the Data.List.Split library as I did: the core internal
> > splitting function is information-preserving, and by using various
> > combinators the user can choose to throw away whatever information
> > they are not interested in.
> 
> Perfect.  So, as you see it, are there one, two, or three functions
> in or hiding in Data.List.Split.Internals that can be factored and
> placed into Data.List that are in line with P1 to P7?  You do not
> actually need to agree to P1 to P7, it is a conceptual exercise.
> The idea is that Data.List.Split would flow more naturally from
> Data.List with these few functions added to it.

Not really.  You are welcome to look at the source of
Data.List.Split.Internals yourself; you will see that the core
functions on top of which everything else is implemented use various
internal data types, to more accurately reflect the richness of
information that is present in the most general case.  If we were to
add these core functions to Data.List.Split, we would have to add
these new data types as well, which seems to go against the spirit of
simplicity found in Data.List.  And in any case, the core functions
are not very easy or convenient to use on their own.

Let me be clear---I am not against adding some splitting functions to
Data.List, if consensus as to what these functions should be is
reached.  But literally pulling a few things out of Data.List.Split as
you propose is not the way to go, as I think you will agree if you
look at the source.

> Finally, as concrete examples or to clarify points, the words
> split, delimiter, separator and variations thereof have been
> used.  This already implies a theme.  Do you conceptualize
> of Data.List.Split as primarily to help programmers from other
> backgrounds to be able to manipulate strings, that is, supply
> some nice idioms but generalized from [Char] to [a]?

No, I see it as a useful tool for manipulating lists in general.

> If I were to write
> 
> organizeBy :: ([a] -> Bool) -> [a] -> [([a], [a])]
> 
> could you think of a specification such that this function
> would be a work horse in implementing Data.List.Split.Internals
> and Data.List.Split?

I could, but I think the result would be rather uglier and harder to
understand than the current implementation.

> Finally, this whole thread brings up the question in my mind
> about module design.  As work is put into Data.List.Split, what
> is the guiding principle that prevents it from becoming
> Data.List.Extensions or to be a bit more direct,
> Data.List.TheFunctionsThatWereForgotten?

Are you serious?  It seems quite clear to me that Data.List.Split will
*not* turn into that.  For one thing, it contains only list-splitting
functions; for another thing, I do not foresee putting that much more
work into it.

> > A theoretical module which contains implementations/combinators for
> > implementing every possible method of list-splitting known to man.
> > This way no one has to argue about what the correct interface for
> > split is, we can just have them all. 
> 
> Is not this Data.List?  In other words, what idea or theme does
> a new Haskell programmer use to decide to first look into Data.List
> as opposed to Data.List.Split and vice versa?

I was being sort of facetious in that quote. =)

-Brent