Haskell platform proposal: split package

Edward Z. Yang ezyang at MIT.EDU
Sat Jul 21 21:42:04 CEST 2012


I haven't reviewed the technical content of the proposal yet (beyond
having used split in prior projects), but I generally approve of
having a split-like package in the Haskell Platform.

Cheers,
Edward

Excerpts from Brent Yorgey's message of Fri Jul 20 16:35:57 -0400 2012:
> Hello everyone,
> 
> This is a proposal for the split package [1] to be included in the
> next major release of the Haskell platform.
> 
> Everyone is invited to review this proposal, following the standard
> procedure [2] for proposing and reviewing packages.
>   
> Review comments should be sent to the libraries mailing list by August
> 20 (arbitrarily chosen; there's plenty of time before the October 1
> deadline [3]). The Haskell Platform wiki will be kept up-to-date with
> the results of the review process:
> 
>   http://trac.haskell.org/haskell-platform/wiki/Proposals/split
> 
> [1] http://hackage.haskell.org/package/split
> [2] http://trac.haskell.org/haskell-platform/wiki/AddingPackages 
> [3] http://trac.haskell.org/haskell-platform/wiki/ReleaseTimetable
> 
> Credits
> =======
> 
> Proposal author and package maintainer: 
>   Brent Yorgey <byorgey at cis.upenn.edu>
> 
> Abstract
> ========
> 
> The Data.List.Split module contains a wide range of strategies for
> splitting lists with respect to some sort of delimiter, mostly
> implemented through a unified combinator interface. The goal is to be
> a flexible yet simple alternative to the standard 'split' function
> found in some other mainstream languages.
> 
> Documentation and tarball from the hackage page:
> 
>   http://hackage.haskell.org/package/split
> 
> Development repo:
> 
>   darcs get http://code.haskell.org/~byorgey/code/split
> 
> Rationale
> =========
> 
> Splitting a list into chunks based on some sort of delimiter(s) is a
> common need, and is provided in the standard libraries of several
> mainstream languages (e.g. Python [4], Ruby [5], Java [6]).  Haskell
> beginners routinely ask whether such a function exists in the standard
> libraries.  For a long time, the answer was no.  Adding such a
> function to Haskell's standard libraries has been proposed multiple
> times over the years, but consensus was never reached on the design of
> such a function. (See, e.g. [7, 8, 9].) 
> 
> [4] http://docs.python.org/py3k/library/stdtypes.html?highlight=split#str.split
> [5] http://www.ruby-doc.org/core-1.9.3/String.html#method-i-split
> [6] http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
> [7] http://www.haskell.org/pipermail/libraries/2006-July/005504.html
> [8] http://www.haskell.org/pipermail/libraries/2006-October/006072.html
> [9] http://www.haskell.org/pipermail/libraries/2008-January/008922.html
> 
> In December 2008 the split package was released, implementing not just
> a single split method, but a wide range of splitting strategies.
> 
> Since then the split package has gained wide acceptance, with almost
> 95 reverse dependencies [10], putting it in the top 40 for number of
> reverse dependencies on Hackage.
> 
> [10] http://packdeps.haskellers.com/reverse/split 
> 
> The package is quite stable. Since the 0.1.4 release in April 2011
> only very minor updates have been made.  It has a large suite of
> QuickCheck properties [11]; to my recollection no bugs have ever been
> reported.
> 
> [11] http://code.haskell.org/~byorgey/code/split/Properties.hs
> 
> API
> ===
> 
> For a detailed description of the package API and example usage, see
> the Haddock documentation:
> 
>   http://hackage.haskell.org/packages/archive/split/0.1.4.3/doc/html/Data-List-Split.html
> 
> Design decisions
> ================
> 
> Most of the library is based around a (rather simple) combinator
> interface.  Combinators are used to build up configuration records
> (recording options such as whether to keep delimiters, whether to keep
> blank segments, etc).  A configuration record is finally handed off to
> a function which performs a generic maximally-information-preserving
> splitting algorithm and then does various postprocessing steps (based
> on the configuration) to selectively throw information away.  It is
> probably not the fastest way to implement these methods, but speed is
> explicitly not a design goal: the aim is to provide a reasonably wide
> range of splitting strategies which can be used simply.  Blazing speed
> (or more complex processing), when needed, can be obtained from a
> proper parsing package.
> 
> Open issues
> ===========
> 
> Use of GHC.Exts
> ---------------
> 
> At the request of a user, the 0.1.4.3 release switched from defining
> its own version of the standard 'build' function, to importing it from
> GHC.Exts.  This allows GHC to do more optimization, resulting in
> reported speedups to uses of splitEvery, splitPlaces, and
> splitPlacesBlanks.  However, this makes the library GHC-specific.  If
> any reviewers think this is an issue I would be willing to go back to
> defining build by hand, or use CPP macros to select between build
> implementations based on the compiler.
> 
> Missing strategies



More information about the Libraries mailing list