an update on the text package proposal and a call for consensus

Duncan Coutts duncan.coutts at googlemail.com
Tue Nov 16 15:18:19 EST 2010


All,

Following the recent discussions about some details in the text package
we have had a revision to the proposal. I am confident that we will now
be able to achieve a consensus on the last issue of contention and thus
we will be able to accept the package as a whole for the next HP
release.

This email is to summarise the issue, to detail the proposed resolution
and to call for consensus on accepting the text package into the Haskell
platform.

Summary
-------

Out of around 90 functions in the Data.Text module, a number of people
had concerns about the names and types of 10 of them. In particular the
concern was about whether names and types should match between the
standard List module and other "list like" modules such as Text (but
other examples include ByteString etc).

Some functions in the Text API that involve selecting a location in a
Text (e.g. to break it at that point). Some of these functions come in
two variants:

  * one based on searching for a character satisfying a predicate
  * one based on searching for a substring

In the standard List module there are only equivalents for the first
variant.

Taking the example of the functions that break a text into two, in the
original proposal the two variants were named as:

break   :: Text           -> Text -> (Text, Text)
breakBy :: (Char -> Bool) -> Text -> (Text, Text)

This gave the short name 'break' to the substring version, and the
longer name 'breakBy' to the character predicate version.

The argument for doing this is that the substring version should be
the common encouraged one and so it should get the short name.

The argument against is that this is inconsistent with the List
library which gives the name 'break' to the element predicate version:

break :: (a -> Bool) -> [a] -> ([a], [a])


Updated proposal
----------------

The updated proposal accepts the point that the names should be
consistent with the List library. The substring versions are given new
names with a suffix to distinguish them. In the break example the two
variants are now:

breakOn :: Text           -> Text -> (Text, Text)
break   :: (Char -> Bool) -> Text -> (Text, Text)

The full set of names and types that have changed are as follows

The substring functions are now named with the 'On' suffix:

breakOn    :: Text -> Text -> (Text, Text)
breakOnEnd :: Text -> Text -> (Text, Text)

breakOnAll :: Text -> Text -> [(Text, Text)]
splitOn    :: Text -> Text -> [Text]

The character predicate functions now match the List names:

break     :: (Char -> Bool) -> Text -> (Text, Text)
span      :: (Char -> Bool) -> Text -> (Text, Text)
partition :: (Char -> Bool) -> Text -> (Text, Text)
find      :: (Char -> Bool) -> Text -> Maybe Char
split     :: (Char -> Bool) -> Text -> [Text]


Minor function
--------------

One last function that some people raised minor concerns about is the
function

count :: Text -> Text -> Int

This function is not in the List library. It was added to the ByteString
library with the type
BS.count :: Word8 -> ByteString -> Int
The suggestion from some reviewers was that the type be equivalent
between ByteString and Text.

The current text proposal leaves the count function as it is. If the
consensus is that these functions should have equivalent types then my
proposal (as a maintainer of the bytestring package) is that we should
generalise the type of the count function in the bytestring package.

Feel free to comment on this issue. On a procedural note, I see no
problem with including a change to the bytestring package into this
proposal for the text package.


Call for consensus
------------------

http://trac.haskell.org/haskell-platform/wiki/AddingPackages#Consensus

Are there any unresolved concerns?

If not then the text package will be accepted with the changes outlined
above.


Duncan
(with his Haskell Platform Steering Committee hat on)



More information about the Libraries mailing list