Haskell Platform Proposal: add the 'text' library
Bryan O'Sullivan
bos at serpentine.com
Tue Sep 7 22:10:27 EDT 2010
Thanks for your comments, Ian. I appreciate your time and care in looking
this over!
> Incidentally, I've just noticed some broken haddock markup for:
> I/O libraries /do not support locale-sensitive I\O
> in
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-IO.html
Thanks for spotting that. It appears to be due to a Haddock bug,
unfortunately.
> a much larger variety of encoding functions
>
> Why not bundle these in the text package, or also put this package in
> the platform?
Either one would induce a dependency on text-icu, which is not as mature as
text, and which would imply a dependency on the rather large ICU library. I
do believe that text-icu should be submitted, but not until it's ready.
On
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> a number of haddocks say
> Subject to fusion.
> but I can't see an explanation for the new user of what this means or
> why they should care.
That's not quite true: it's actually the very first thing documented:
http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html#1
However, that description is skimpy, and I've replaced it:
-- Most of the functions in this module are subject to /fusion/,
-- meaning that a pipeline of such functions will usually allocate at
-- most one 'Text' value.
--
-- As an example, consider the following pipeline:
--
-- > import Data.Text as T
-- > import Data.Text.Encoding as E
-- >
-- > countChars :: ByteString -> Int
-- > countChars = T.length . T.toUpper . E.decodeUtf8
--
-- From the type signatures involved, this looks like it should
-- allocate one 'ByteString' value, and two 'Text' values. However,
-- when a module is compiled with optimisation enabled under GHC, the
-- two intermediate 'Text' values will be optimised away, and the
-- function will be compiled down to a single loop over the source
-- 'ByteString'.
--
-- Functions that can be fused by the compiler are marked with the
-- phrase \"Subject to fusion\".
> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Encoding-Error.html
> I would expect lenientDecode etc to use the On{En,De}codeError type
> synonyms defined above.
>
Good point. I've fixed that up.
> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text-Lazy.html
> the choice 'B' seems odd:
> import qualified Data.Text.Lazy as B
>
Yep. Fixed :-)
> I would have expected
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> to mention the existence of .Lazy in its description, and an explanation
> of when I should use it.
>
I've expanded that discussion.
Are there cases when Data.Text is significantly faster than
> Data.Text.Lazy?
It's often about twice as fast, but that depends on the nature of the code
and data involved.
> In
>
> http://hackage.haskell.org/packages/archive/text/0.8.0.0/doc/html/Data-Text.html
> isInfixOf's docs day:
> O(n+m) The isInfixOf function takes two Texts and returns True iff the
> first is contained, wholly and intact, anywhere within the second.
> In (unlikely) bad cases, this function's time complexity degrades
> towards O(n*m).
> I think the complexity at the start, in the same place as all the other
> complexities, ought to be O(n*m), with the common case given afterwards.
>
I'd prefer to keep this as is.
> And replace's docs just say
> O(m+n) Replace every occurrence of one substring with another.
> but should presumably be O(n*m). It's also not necessarily clear what m
> and n refer to.
>
The two parameters to the function?
> > unicode-unaware case conversion (map toUpper is an unsafe case
> conversion)
>
> Surely this is something that should be added to Data.Char, irrespective
> of whether text is added to the HP?
Yes, but that's a not-this-problem problem.
> A large testsuite, with coverage data, is provided.
>
> It would be nice if this was on the text package's page, rather than in
> ~dons.
>
I don't know how to do that.
> > RecordWildCards
>
> I'm not a fan, but I fear I may be in the minority.
>
It's just used internally, so why do you mind?
There are a number of other differences which probably want to be tidied
> up (mostly functions which are in one package but not the other, and
> ByteString has IO functions mixed in with the non-IO functions), but
> those seemed to be the most significant ones. Also,
> prefixed :: Text -> Text -> Maybe Text
> is analogous to
> stripPrefix :: Eq a => [a] -> [a] -> Maybe [a]
> in Data.List
>
I hadn't seen that. Hmm. For use with view patterns, I prefer the name I'm
using right now.
This also made me notice that Text haddocks tend to use 'b' as a type
> variable rather than 'a', e.g.
> foldl :: (b -> Char -> b) -> b -> Text -> b
>
Historical artifact :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20100907/3632a0c7/attachment.html
More information about the Libraries
mailing list