[Haskell-cafe] Re: Interesting critique of OCaml
Donnie Jones
donnie at darthik.com
Thu May 8 14:59:17 EDT 2008
Hello,
I pasted a copy of the article below for those that cannot access the site.
I would be interested to see an article on Haskell in the same light as this
Ocaml article, aka a constructive criticism of Haskell.
Enjoy!
__
Donnie
### Begin Article ###
Why Ocaml Sucks<http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/>
Published by Brian <http://enfranchisedmind.com/blog/author/bhurt-aw/> at
6:49 pm under Functional Languages: Ocaml,
Haskell<http://enfranchisedmind.com/blog/categories/programming/functional-programming/>
One of the ways to not fall into the blub fallacy is to regularly consider
those ways in which your favorite language is inferior, and could be
improved- preferrably radically improved. Now, it should come as a surprise
to no one that my favorite language is (currently) Ocaml. So as an
intellectual exercise I want to list at least some of the ways that Ocaml
falls short as a language.
I will note that if you use this post as a reason to *not* use Ocaml, you
are a fool and are missing the point of this post. With the
*possible*exception of Haskell, no other language I know of comes
close to getting all
the things right that Ocaml gets right.
So, with that in mind, let's begin.
1. Lack of Parallelism
In case you haven't guessed it from reading this blog, I think
parallelism is going to be the big problem for the next decade. And while
Ocaml does have threads, it also has a big ol' global interpreter lock, so
that only one thread can be executing Ocaml code at a time. Which severely
limits the utility of threads, and severely limits the ability of Ocaml
programmers to take advantage of multiple cores. Now, to give the INRIA team
credit, the reason they have for this is a good one- we (programmers) still
don't know for sure how to write multithreaded code safely (I have my
suspicions, but I have no proof), and we had even less of an idea a decade
and a half ago when the fundamentals of Ocaml were being decided. And
supporting multithreaded code slows down the garbage collector. And, a
decade and a half ago, multithreaded code was a lot less important, and
multicore systems were rare and expensive. So this is mainly just Ocaml
showing it's age. But still, if I were to pick the biggest shortcomming of
Ocaml, this would be it.
2. Printf
You know what? Printf was a bad idea for C- having this special
domain-specific language in what looks like strings to control formatting.
It's even worse in Ocaml, where you have to add special kludges to the
compiler to support it (at least the way Ocaml did it- there are smarter
ways <http://www.brics.dk/RS/98/12/> that don't require compiler
kludges). You might think that "%3d" is a string in Ocaml, and sometimes
you'd be right. But sometimes it's a (int -> '_a, '_b, '_c, '_a) format4.
This horrid type (which is *not* a string) is necessary to encode the
types of the arguments printf needs to be passed it.
The one thing I know of that C++ did better than Ocaml was ditching
printf. And the idea of iostreams even isn't that bad- except for the fact
that they encouraged generations of C++ programmers to abuse operator
overloading (hint to Bjarne Stroustrup: << and >> are not the I/O operators,
they're the bit shift operators!), and they made the iostream classes
exceptionally difficult to inherit from or extend. But a combination of C++
style iostream operators and Java style iostream classes would have worked
so much better, and not required hacking the compiler. An even better idea
might be some variant of functional unparsing<http://www.brics.dk/RS/98/12/>
.
3. Lack of multi-file modules
Ocaml has an exceptionally powerfull and flexible module and functor
system, which stops rather annoyingly at the file level. Files are modules,
and files (and modules) can contain other modules within them, but you can't
collect several files into one big multi-file module. You especially can't
say that certain files, while they may be visible to other modules within
the multi-file module, aren't visible outside the multi-file module. This is
especially useful if you want to factor out common base functionality from a
library of modules into a shared module, but not productize it for export to
the outside world. This limits the ability of the programmers to correctly
structure large projects.
The -for-pack arguments help fix a lot of this, sorta- but they require
smarts in the build system and generally special .mli files to control
visibility. It can be done, but it's not clean- it makes the build process a
lot more complex, and important information about which files are externally
visible or not is contained somewhere that is not in the source code. This
could certainly be done a hell of a lot better than it is.
4. Mutable data
Mutable data is both a blessing and a curse. It is a blessing when
learning the language, especially when you're coming from conventional
imperative programming languages. Rather than having to learn what a lazy
real time catenable dequeue is, and why you want one, you can just bang out
a quick mutable doubly linked list and get the same behavior. I would argue
that if you're coming from the imperative/OO world, it's easier to learn
Ocaml than Haskell for precisely this reason.
But this implies that mutable data is training wheels of a sort. And,
like the fact that training wheels prevent you from going fast, mutable data
prevents Ocaml from doing a lot of interesting and useful things. Many of
Haskell's most interesting features- such as STM, easy multithreading, and
deforrestation, and scrap your boilerplate, arise out of Haskell being
purely applicative. Mutable data is holding Ocaml back.
5. Strings
This is more of a minor nit, but the representation of strings as a
(compact) array of chars is stupid. It made sense of pointer-arithmetic C,
not so much for Ocaml. The definition of chars as 8 bits prevents the
internationalization of Ocaml beyond the European languages, and the
representation is inefficient. The most important operations on strings are
either iterating over all the characters in a string, or concatenating two
strings. Random access and mutation is rare, *unless* you're using the
string as a bit or byte array, which is encouraged by the nature of strings.
Bit arrays should be first class types of themselves, with specialized
internal representations, allowing strings to be optimized as strings. An
unbalanced tree based structure would allow O(1) (and exceptionally fast)
string concatenation while preserving the O(N) cost to iterate over all
characters in a string. Also, strings are currently mutable data, and as
described above, mutable data is teh sucketh. This is doubly so for strings,
which can be profitably interned (and thus should be).
6. Shift-reduce conflicts
Shift-reduce conflicts happen in parser generators, especially LALR(1)
parser generators like yacc and ocamlyacc, where there are situations where
the parser doesn't know if the current expression is complete, or can still
go on. The golden example of this is the "dangling else" problem of C and
other languages, where you might see code like this:
if (test1) then
if (test2) then
expr1
else
expr2
The problem is that the parser, after it has finished parsing
expr1expression and looking at the
else, doesn't know wether to shift it (making the else and expr2 part of
the inner if) or reduce it (making the else part of the outer if).
I strongly recommend everyone who is developing a language to write the
initial syntax in a LALR(1) parser, and eliminate all shift-reduce (and
especially reduce-reduce) conflicts while you still can, because every
shift-reduce conflict is a bug waiting to happen. In the above example,
based on indenting, it is very likely to be meant to be part of the outer
if, but the parser is going to bind it to the wrong if, causing a bug. This
is slightly less bad in Ocaml, where generally the bug will be a type error,
but that doesn't mean it's good. And Ocaml has literally dozens of these
suckers lying in wait to bite and unwary programmer. Even I get bitten by
shift-reduce conflict generated bugs on a regular basis.
A quick side digression for those who started yelling that the
indentation should be significant to the compiler. Consider the output that
would be caused by printing the following 3 C/Ocaml strings: "\n \tx\n", "\n
\t x\n", and "\n\t x\n". Each one starts with a newline, ends with an x and
a newline, and contains exactly two spaces and a tab in between. Now, what
columns do the three x's end up in? The answer is "it depends upon which
editor you're using to view the output with, and what configuration that
editor has". I just spent a large chunk of this afternoon cleaning up a
bunch of code written by a programmer who merrily mixed spaces and tabs, and
used a different editor than I do. As a consequence, what looked neatly
formatted to him looked like gibberish to me. And if two humans can't agree
on whether a particular piece of code is well formatted or not, what hope
does the computer have to figure out the situation? The solution is to
eliminate the root cause of the problem (the shift-reduce conflicts), not
add a new way to introduce subtle and hard to debug errors.
7. Not_found
One of the worst design decisions Ocaml has to have made is Not_found-
this is the exception that's thrown whenever something isn't found. Tha'ts
helpfull, isn't it? It's thrown by the Set and Map modules (and all their
functor applications) the List module, various other data structure
libraries, various regular expression libraries, and, as a rule, just about
any random peice of code. And it contains absolutely no information as to
what is not found, or what was being looked for. Or even what was doing the
looking. At least Failure and Invalid_argument at least take a string
argument, which is generally is the name the function that failed. So if you
see the exception Invalid_argument("hd"), you know at least to be looking at
calls to List.hd (or to other functions named "hd"). But Not_found? This is
the sound of your hd hitting the desk, repeatedly.
8. Exceptions
And while I'm ragging on Not_found, let me rag on exceptions a bit.
Exceptions are used way too commonly in Ocaml- OK, I'll give Failure,
Invalid_argument, and Assertion_failure a pass, but basically Not_found and
End_of_file should never, *ever*, be thrown. Return 'a option instead.
This has non-trivial implications. For example, take the classic newbie
ocaml "mistake"- write a function that reads in a file and returns the file
as a list of strings, one string per line. Should be easy. We use input_line
as to read a line, and write our code like:
let read_file desc =
let rec loop accum =
try
let line = input desc in
loop (line :: accum)
with
| End_of_file -> List.rev accum
in loop []
;;
And they can't understand why this code blows up when they try to read a
file with more than 30,000 lines in it. There are standard solutions to
this- but they make the code uglier, and by far the best is to wrap the
exception throwing code in an expression that turns the exception into an
option, like:
let read_file desc =
let rec loop accum =
let line =
try
Some(input desc)
with
| End_of_file -> None
in
match line with
| Some s -> loop (s :: accum)
| None -> List.rev accum
in loop []
;;
At which point you really have to raise the question of why input_string
didn't just return string_option from the get-go.
The other popular alternative is to use mutable data. Which is just
trading off one problem set for another.
There is, by the way, an interesting idea for expetion handling that's
been raised recently (see
here<http://dx.doi.org/10.1017/S0956796801004099>and
here <http://portal.acm.org/citation.cfm?id=1022471.1022475>). The basic
idea is that the current Ocaml syntax for declaring an exception handler is:
try
expr1
with
| Exception -> expr2
where if expr1 doesn't throw an exception, the value of the hole
expression is the value returned by expr1- otherwise, if it throws
Exception, then the whole expression has the value expr2. This syntax is
replaced by:
try var = expr1 in
expr2
with
| Exception -> expr3
Here, if expr1 does not throw an exception, then the value of the whole
expression is that of expr2 with variable var having the value of expr1,
while if it throws Exception, then the value of the whole expression is
then expr3. Note that exceptions are only caught in expr1- it's just that
if an exception is caught, then expr2 is skipped as well. But this means
that calls from within expr2 can be tail calls. Let's take a look at our
read_file program, the newbie way, a third time, this time with our new
exception syntax:
let read_file desc =
let rec loop accum =
try line = input_line desc in
loop (line :: accum)
with | End_of_file -> List.rev accum
in
loop []
;;
Since the tail call to loop is outside where we catch exceptions, it's a
true tail call, and the function works "as expected". And the fact that
we're using exceptions instead of options isn't that big of a deal anymore
(Not_found still sucks, due to it's simple ubiquity and taciturnity). If
someone with more time and/or camlp4 knowledge than I have wants to write
this up as an extension, I'd be eternally gratefull.
While I'm at it, would it be possible to have exceptions in the type
system? I know Java tried this and everyone hated it, but Ocaml has two
things that Java didn't- type inference and type variables. The type
variables are nice because they allow you to deal with "unknown" exceptions-
for example you could have a function of type (int -> int throws 'a) ->
int throws 'a, meaning that you don't know (or care) what exceptions the
passed-in funciton might throw, but that you throw the same exceptions
(presumably by calling the passed-in function). Type inference also means
you don't have to declare most of these cases, the compiler can infer what
exceptions are thrown. This level of changes to the type system are highly
unlikely, but a boy can dream, can't he?
9. Deforrestation
This really should be under the "mutable data is the sucketh" section,
but oh well. Haskell has introduced a very interesting and (to my knowledge)
unique layer of optimization, called deforrestation. Basically, what happens
if that the ghc compiler recognizes and can optimize certain common
sub-optimal data structure transformations. For example, it can convert map
f (map g lst) into map (f . g) lst, where the peroid (.) is the function
composition operator. There are a couple of obvious advantages to being able
to do this- for example, by doing so we've eliminated an intermediate data
structure, and created new opportunities for optimizing the combined f and g
functions.
But there are two things of serious interest to me about this. First of
all, this is the first time I've seen optimizations at this level being this
easily performed. Companies like Intel and IBM have thrown literally
man-millenia at this problem, and the solutions they've come up with were
limited and fragile (slight changes to the code would enable or disable the
optimization). And yet the Haskell people implemented in one Simon Peyton
Jones long weekend (also known as a couple of man months for mere mortals
like you and I). The reason for this is that the real difficult is not
actually implementing the transformation, or even detecting that the
transformation might be applicable, it's proving that the transformation is
correct, that the code after the transformation is applied behaves
identically to the code before the transformation is applied. In langauges
with mutable data, like Fortran and C++ (and Ocaml), this is decidedly
non-trivial. In Haskell, you can dash off the proof that it's always correct
on the back of a cocktail napkin- proving that it's correct in the general
case for all lists and all functions. And that thus the compiler doesn't
even need to check- once it detects that the pattern can be applied, it can
just go ahead and apply it. Haskell is doing data structure level
optimizations with the ease that most other compiler do peephole instruction
optimization. This is a non-trivial result.
The second important aspect of this is that it changes the concept of
what optimization is, or should be. I forget which paper it was I was
reading that said that optimization should really be called depessimization.
That the programmer wants to introduce pessimizations- the programmer could
do the above deforrestation himself, except that to do so would make
maintainance more difficult, as it'd break module boundaries, or requires
knowledge of how a particular module is implemented, or simply requires
knowledge of widely seperated peices of code. The goal of the compiler,
rather than striving to produce some "optimal" (and thus impossible to
obtain) implementation, but simple to undo the pessimizations that the
programmer has introduced in the name of maintainablity, modularity,
readability, and/or simplicity. The programmer shouldn't have to worry about
creating intermediate data structures, and should worry about corrupting his
code in the name of performance- that's the compilers job (don't break your
code, let the computer do that for you! :-).
Needless to say, between the mutable data, the side effects, and the
handling of exceptions, Ocaml isn't going get deforrestation any time soon.
10. Limited standard library
If writing monad tutorials is the cottage industry of Haskell
programmers, than rewriting the standard library is Ocaml's cottage
industry. You almost can't call yourself an Ocaml programmer if you haven't
rewritten a goodly chunk of the standard library at least once. This is
because every Ocaml programmer has a long list of functions that should
(they think) be in the standard library but just aren't. The big ones for me
are the lack of Int and Float modules ready for use by the Set and Map
functors, the lack of an already-instantiated Map and Set modules for
Strings, Ints, and Floats, and the lack of a list to set/map function. None
of these are exactly hard to do, just annoying to have to constantly redo.
Of course, the problem with this is the range and design patterns of
Ocaml programmers- especially with comparison to languages like Java and
C++, or even Ruby. The design patterns of a top-5% Java programmer is not
all that different from the design philosophy of a bottom-30%'er. There may
be differences in precisely what features are supported, and what names
things are given may change, but the broad stroke designs will be similiar.
The design patterns of Brian Hurt circa 2008 are radically different than
the design patterns of Brian Hurt circa 2003. For example, were I to redo
extlib now, it's be purely applicative, heavily functorized, lots of monads,
and a fair bit of lazy evaluation, as opposed to the code I wrote in 2003.
11. Slow lazy
Speaking of lazy evaluation- Ocaml's built-in lazy evaluation is wicked
slow. I've actually timed it, and it's like 130+ clock cycles of overhead to
force a lazy value the first time, and over 70 clock cycles of overhead to
force it the second and subsequent times (when all it has to do is return
the cached value). Given that most of the good uses of lazy evaluation has
it wrapping computations that are like 10 clock cycles long or so, the
overhead of the lazy thunk dominates. In this era of Ruby programmers, this
may not seem like much- but this overhead discourages Ocaml programmers from
using lazy evaluation. Which is a shame, as anyone who has read Okasaki
knows how frimping usefull it is.
Well, that's my list so far- subject to revision without notice. The vast
majority of them qualify as picking nits- and they are. If you can damn with
faint praise, then you should also be able to praise with faint damning- and
this certainly qualifies as that. Most of the problems here only became
apparent (or their solutions became apparent) long after Ocaml was mature
and set. For example, it's only be recently that multi-threading has been a
big deal. And the usefullness of monads and lazy evaluation for functional
programming weren't understood until the late nineties/early 21st, a decade
after Ocaml's formative years. So this whole post mostly amounts to whining
that the Ocaml developers weren't *even more insightfull and
foresightfull*than they were. And, compared to the major revisions
Ocaml's "age-mates"
Java and Perl have gone through, it's stood up well with the passing of
time.
No lanuage is perfect, including Ocaml. And to continue to improve we must
understand what has gone before- both what worked, and what didn't.
Popularity: 6% [?<http://alexking.org/projects/wordpress/popularity-contest>
]
### End Article ###
On Thu, May 8, 2008 at 2:27 PM, Andrew Coppin <andrewcoppin at btinternet.com>
wrote:
> Chad Scherrer wrote:
>
>> PS - the link now gives a "500 server error" - did our traffic overwhelm
>> it? Is
>> the cafe the next slashdot? ;)
>>
>>
>
> Hell, I can't even get a TCP SYN-ACK packet out of it... :-/
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20080508/b50b6f50/attachment.htm
More information about the Haskell-Cafe
mailing list