[Haskell-cafe] Batteries included (Was: GHC is a monopoly compiler)

Joachim Durchholz jo at durchholz.org
Fri Sep 30 07:54:43 UTC 2016


Am 30.09.2016 um 08:44 schrieb Tobias Dammers:
> FWIW, C++ has:
>
> - char* and const char*, inherited from C
> - wchar_t*, const wchar_t*
> - the above, but with an explicit length passed along as a separate argument
> - std::string
> - std::wstring (is that what it's called?)
> - various string implementations, provided by platform APIs and
> frameworks (QString, LPTCHAR, and other nonsense)
>
> And they all suck - most are really just byte arrays, some try to
> implement Unicode but fall short, and the ones that do it mostly right
> are specific to a sub-ecosystem. It's a mess.

Yep - preferred type was the zero-terminated byte array, and after that 
things have diverged.

> And do I need to mention PHP? That one doesn't have a useful string type
> at all,

It's still a string type :-)

> And finally: while Haskell makes you choose between "byte array",
> "string", and "list of code points", this isn't really awfully different
> from languages like Java or C#, where you make a similar choice (string?
> StringBuilder? byte[]?),

At least in Java, you don't really choose, circumstances dictate.

String are immutable. Nice semantics, O(N^2) for N concatenations.
Vast majority of APIs uses this, most strongly preferred.

StringBuilder is mutable.
In practice, people use it as a scratchpad to construct Strings if they 
need a loop. Majority of cases is local variables, libraries with the 
purpose of constructing a large output string tend to have a 
collect-the-output buffer and pass that around internally but don't 
expose it to callers (maybe to callbacks, haven't seen that done though).

byte[] for string manipulation is a really itchy hair shirt, you don't 
do that unless very strong reasons compel you to.
I am aware of exactly two use cases: Password storage (to be able to 
wipe the data ASAP), and converting from and to external byte streams 
that carry text.

So it's all straightforward, and String is really the preferred use case.
There's a lot of things that Java doesn't get quite right, but string 
handling is not one of these :-)


More information about the Haskell-Cafe mailing list