[Haskell-cafe] Policy for taking over a package on Hackage

Thu May 26 22:55:19 CEST 2011

On Wed, May 25, 2011 at 4:02 PM, Ivan Lazar Miljenovic
<ivan.miljenovic at gmail.com> wrote:
> On 26 May 2011 08:49, wren ng thornton <wren at freegeek.org> wrote:
>> On 5/25/11 1:03 PM, Bryan O'Sullivan wrote:
>>>
>>> On Wed, May 25, 2011 at 5:59 AM, Ivan Lazar Miljenovic<
>>> ivan.miljenovic at gmail.com>  wrote:
>>>
>>>> Well, using the Char8 version.
>>>
>>> Just because you *could* do that, it doesn't mean that you *should*. It's
>>> a
>>> bad idea to use bytestrings for manipulating text, yet the only plausible
>>> reason to have wl-pprint handle bytestrings is so that they can be used as
>>> text.
>>
>> It's worth highlighting that even with the Char8 version of ByteStrings you
>> still run into encoding issues. Remember the days before Unicode came about?
>> True, 8-bit encodings are often ASCII-compatible and therefore the
>> representation of digits and whitespace are consistent regardless of
>> (ASCII-compatible) encoding, but that's still just begging for issues. What
>> are the semantics of the byte 0xA0 with respect to pretty-printing issues
>> like linewraps? Are they consistent among all extant 8-bit encodings? What
>> about bytes in 0x80..0x9F? What about 0x7F for that matter?
>>
>> I won't say that ByteStrings should never be used for text (there are plenty
>> of programs whose use of text involves only whitespace splitting and moving
>> around the resultant opaque blobs of memory). But at a bare minimum, the use
>> of ByteStrings for encoding text needs to be done via newtype wrapper(s)
>> which keep track of the encoding. Especially for typeclass instances.
>
> *shrug* this discussion on #haskell came about because lispy wanted to
> generate textual ByteStrings (using just ASCII) and would prefer not
> to have the overhead of Text.

Actually, I am okay with Text.  My application is translating GHC Core
to other languages.  I may have misrepresented my position on IRC.
Oops.  In the case of GHC Core, all the unicode bytes are z encoded
[1] away so that I'm certain to just have ascii bytes, and I don't
think performance will be an issue.

Being able to choose between String and Text in the same library is
nice.  Once you add support for letting the user specify their choice
of String or Text, does it really cost that much to add ByteString?

I think there are some cases where ByteString might make sense.  Take
darcs patches.  Darcs does it's best to be encoding agnostic about
patch hunk data, yet the syntax surrounding the hunk data is just
ascii compatible bytes.  Last time I checked, ByteString was still
used to store the patch hunks as just a blob of bytes.  You end up
passing that to some pretty printer code when serializing it, even
though you're doing your best not to dictate the encoding.

I know darcs has had a lot of important utf-8 changes since I looked
at the hunk representation code so what I said above may no longer be
the case.

Admittedly it's a rare case.

Jason

[1] http://hackage.haskell.org/package/zenc