[Haskell-cafe] Why so many strings in Network.URI, System.Posix and similar libraries?

Jeremy Shaw jeremy at n-heptane.com
Sun Mar 11 05:51:46 CET 2012


It is mostly because those libraries are far older than Text and
ByteString, so String was the only choice at the time. Modernizing them is
good.. but would also break a lot of code. And in many core libraries, the
functions are required to have String types in order to be Haskell 98
compliant.

So, modernization is good. But also requires significant effort, and
someone willing to make that effort.

Also, URIs are not defined in terms of octets.. but in terms of characters.
If you write a URI down on a piece of paper -- what octets are you using?
None.. it's some scribbles on a paper. It is the characters that are
important, not the bit representation. If you render a URI in a utf-8
encoded document versus a utf-16 encoded document.. the octets will be
different, but the meaning will be the same. Because it is the characters
that are important. For a URI Text would be a more compact representation
than String.. but ByteString is a bit dodgy since it is not well defined
what those bytes represent. (though if you use a newtype wrapper around
ByteString to declare that it is Ascii, then that would be fine).

- jeremy

On Sat, Mar 10, 2012 at 9:24 PM, Jason Dusek <jason.dusek at gmail.com> wrote:

> The content of URIs is defined in terms of octets in the RFC,
> and all Posix interfaces are byte streams and C strings, not
> character strings. Yet in Haskell, we find these objects exposed
> with String interfaces:
>
> > :info Network.URI.URI
> data URI
>  = URI {uriScheme :: String,
>         uriAuthority :: Maybe URIAuth,
>         uriPath :: String,
>         uriQuery :: String,
>         uriFragment :: String}
>        -- Defined in Network.URI
>
> > :info System.Posix.Env.getEnvironment
> System.Posix.Env.getEnvironment :: IO [(String, String)]
>        -- Defined in System.Posix.Env
>
> But there is no law that environment variables must be made of
> characters:
>
>  :; export x=$'\xFF' ; echo -n $x | xxd -p
>  ff
>  :; locale
>  LANG="en_US.UTF-8"
>
> That the relationship between bytes and characters can be
> confusing, both in working with UNIX and in dealing with web
> protocols, is undeniable -- but it seems unwise to limit the
> options available to Haskell programmers in dealing with these
> systems.
>
> --
> Jason Dusek
> pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120310/1d6a839e/attachment.htm>


More information about the Haskell-Cafe mailing list