Haskell Platform Proposal: add the 'text' library

Tue Sep 7 22:10:27 EDT 2010

Thanks for your comments, Ian. I appreciate your time and care in looking
this over!

> Incidentally, I've just noticed some broken haddock markup for:
>    I/O libraries /do not support locale-sensitive I\O
> in
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-IO.html

Thanks for spotting that. It appears to be due to a Haddock bug,
unfortunately.

> a much larger variety of encoding functions
>
> Why not bundle these in the text package, or also put this package in
> the platform?

Either one would induce a dependency on text-icu, which is not as mature as
text, and which would imply a dependency on the rather large ICU library. I
do believe that text-icu should be submitted, but not until it's ready.

On
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> a number of haddocks say
>    Subject to fusion.
> but I can't see an explanation for the new user of what this means or
> why they should care.

That's not quite true: it's actually the very first thing documented:
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html#1

However, that description is skimpy, and I've replaced it:

-- Most of the functions in this module are subject to /fusion/,
-- meaning that a pipeline of such functions will usually allocate at
-- most one 'Text' value.
--
-- As an example, consider the following pipeline:
--
-- > import Data.Text as T
-- > import Data.Text.Encoding as E
-- >
-- > countChars :: ByteString -> Int
-- > countChars = T.length . T.toUpper . E.decodeUtf8
--
-- From the type signatures involved, this looks like it should
-- allocate one 'ByteString' value, and two 'Text' values. However,
-- when a module is compiled with optimisation enabled under GHC, the
-- two intermediate 'Text' values will be optimised away, and the
-- function will be compiled down to a single loop over the source
-- 'ByteString'.
--
-- Functions that can be fused by the compiler are marked with the
-- phrase \"Subject to fusion\".

> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Encoding-Error.html
> I would expect lenientDecode etc to use the On{En,De}codeError type
> synonyms defined above.
>

Good point. I've fixed that up.

> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Lazy.html
> the choice 'B' seems odd:
>    import qualified Data.Text.Lazy as B
>

Yep. Fixed :-)

> I would have expected
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> to mention the existence of .Lazy in its description, and an explanation
> of when I should use it.
>

I've expanded that discussion.

Are there cases when Data.Text is significantly faster than
> Data.Text.Lazy?

It's often about twice as fast, but that depends on the nature of the code
and data involved.

> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> isInfixOf's docs day:
>    O(n+m) The isInfixOf function takes two Texts and returns True iff the
>    first is contained, wholly and intact, anywhere within the second.
>    In (unlikely) bad cases, this function's time complexity degrades
>    towards O(n*m).
> I think the complexity at the start, in the same place as all the other
> complexities, ought to be O(n*m), with the common case given afterwards.
>

I'd prefer to keep this as is.

> And replace's docs just say
>    O(m+n) Replace every occurrence of one substring with another.
> but should presumably be O(n*m). It's also not necessarily clear what m
> and n refer to.
>

The two parameters to the function?

> > unicode-unaware case conversion (map toUpper is an unsafe case

> conversion)
>
> Surely this is something that should be added to Data.Char, irrespective
> of whether text is added to the HP?

Yes, but that's a not-this-problem problem.

> A large testsuite, with coverage data, is provided.
>
> It would be nice if this was on the text package's page, rather than in
> ~dons.
>

I don't know how to do that.

> > RecordWildCards
>
> I'm not a fan, but I fear I may be in the minority.
>

It's just used internally, so why do you mind?

There are a number of other differences which probably want to be tidied
> up (mostly functions which are in one package but not the other, and
> ByteString has IO functions mixed in with the non-IO functions), but
> those seemed to be the most significant ones. Also,
>    prefixed :: Text -> Text -> Maybe Text
> is analogous to
>    stripPrefix :: Eq a => [a] -> [a] -> Maybe [a]
> in Data.List
>

I hadn't seen that. Hmm. For use with view patterns, I prefer the name I'm
using right now.

This also made me notice that Text haddocks tend to use 'b' as a type
> variable rather than 'a', e.g.
>    foldl :: (b -> Char -> b) -> b -> Text -> b
>

Historical artifact :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20100907/3632a0c7/attachment.html