[Haskell-cafe] Is the #functional programming paradigm antithetical to efficient strings? #Haskell

Tue Jul 12 02:24:51 UTC 2016

Please do NOT confuse "the functional programming paradigm"
with Haskell.  "The imperative programming paradigm" is not
identical to C, is it?

The following measurements were made on a 3.2 GHz core i5 Mac
running OSX 10.11 with clang (Apple LLVM version 7.3.0).

Making an array with 10,000,000 pointers to "Hello world"
in optimised C took 32 milliseconds and concatenating them
to make a new 110,000,000 character string took 134 milliseconds.

Using the built-in 'string' type in C++, no array, just hr += hw,
took 260 milliseconds optimised, 460 milliseconds unoptimised.

The maximum String size in SML/NJ is 2^24 characters, which
is unfortunately too small for this problem.  Taking 1,000,000
strings and concatenating them took 61.4 milliseconds, so
presumably if String.maxSize were large enough the ML
version would take 614 milliseconds.

Making a list of "Hello world" 10,000,000 times and then
concatenating that list to produce a single String took
0.87 seconds (start program to end program) in Haskell.

Having made an array of 10,000,000 'Hello world' strings,
concatenating the strings into a single String (a *mutable*
packed array of character codes) took 0.50 seconds in Smalltalk
(an object-oriented programming language).

Concatenating the ten million strings in Java 1.8 using
a StringBuilder took 0.84 seconds, very close to the Haskell
time.

Concatenating 10,000,000 copies of 'Hello world' in Python3
(using StringIO) took 1.312 seconds.

Frankly, the Haskell time just doesn't stand out here.
In a real program, I'd be trying to avoid string concatenation
in *any* language.

By the way, one toy microbenchmark by itself tells us NOTHING
worth knowing about string efficiency, because there are many
many other things we want to do to strings.  A colleague of
mine who uses C++ for text-heavy applications wouldn't be caught
dead using C++ strings.

I note that Clean, which in many ways resembles Haskell, has
always had packed byte strings.  Then of course there are
languages like OCaml and F#, where F# uses the same string
processing as C#, so F# strings should have the *same*
efficiency (whatever that means) as C#.