Haskell Platform Proposal: add the 'text' library
Ian Lynagh
igloo at earth.li
Wed Sep 8 10:18:41 EDT 2010
On Tue, Sep 07, 2010 at 11:21:19PM +0100, Duncan Coutts wrote:
> On 7 September 2010 22:50, Ian Lynagh <igloo at earth.li> wrote:
>
> > I compared the API of Data.Text and Data.ByteString.Char8 and found a
> > number of differences:
>
> Many of these are deliberate and sensible.
Some at least seem just gratuitously different, e.g.:
BS: break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString)
Text: break :: Text -> Text -> (Text, Text)
breakBy :: (Char -> Bool) -> Text -> (Text, Text)
> The thing with text as
> opposed to lists/arrays is that almost all operations you want to do
> are substring based and not element based. A Unicode code point (a
> Char) is sadly only roughly related to the human concept of a
> character. In particular there are combining characters. So even if
> you want to search or split on a particular "character" that may mean
> searching for a short sequence of Chars / code points.
Hmm, wouldn't you want to be able to break on
either
<a-with-umlaut>
or
<a> <umlaut combining character>
in that case?
Also, even if the intention is that you
break [<a>, <umlaut combining character>]
people will still use it for other things, e.g.
break "END FOO"
and wonder why they are not able to do likewise with bytestring.
Even if there is a case where you would want different behaviour in the
two packages, I think it would be bettre if the function names weren't
the same.
> > I think the two APIs ought to be brought into agreement.
>
> Perhaps. If so, then it is the ByteString.Char8 that ought to be
> brought into agreement with Text, not the other way around.
I don't have an opinion on what the APIs should look like; I'd just like
them to be consistent.
> > There are a number of other differences which probably want to be tidied
> > up (mostly functions which are in one package but not the other,
>
> What are you thinking of specifically?
There are a number of them:
In Text only:
center, chunksOf, dropAround, dropWhileEnd, justifyLeft,
justifyRight, partitionBy, prefixed, replace, strip, stripEnd,
stripStart, suffixed, compareLength, toCaseFold, toLower, toUpper
In BS only:
copy, elem, elemIndex, elemIndexEnd, elemIndices, findIndices,
findSubstring, findSubstrings, foldr', foldr1', notElem, readInt,
readInteger, sort, unzip
> > ByteString has IO functions mixed in with the non-IO functions,
>
> Which I don't think was a good idea. I would prefer to split them up.
Agreed, but I would like us to move towards consistency.
Thanks
Ian
More information about the Libraries
mailing list