[Haskell-cafe] Re: Interesting critique of OCaml

Donnie Jones donnie at darthik.com
Thu May 8 14:59:17 EDT 2008

I pasted a copy of the article below for those that cannot access the site.

I would be interested to see an article on Haskell in the same light as this
Ocaml article, aka a constructive criticism of Haskell.


### Begin Article ###
Why Ocaml Sucks<http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/>

Published by Brian <http://enfranchisedmind.com/blog/author/bhurt-aw/> at
6:49 pm under Functional Languages: Ocaml,

One of the ways to not fall into the blub fallacy is to regularly consider
those ways in which your favorite language is inferior, and could be
improved- preferrably radically improved. Now, it should come as a surprise
to no one that my favorite language is (currently) Ocaml. So as an
intellectual exercise I want to list at least some of the ways that Ocaml
falls short as a language.

I will note that if you use this post as a reason to *not* use Ocaml, you
are a fool and are missing the point of this post. With the
*possible*exception of Haskell, no other language I know of comes
close to getting all
the things right that Ocaml gets right.

So, with that in mind, let's begin.

   1. Lack of Parallelism

   In case you haven't guessed it from reading this blog, I think
   parallelism is going to be the big problem for the next decade. And while
   Ocaml does have threads, it also has a big ol' global interpreter lock, so
   that only one thread can be executing Ocaml code at a time. Which severely
   limits the utility of threads, and severely limits the ability of Ocaml
   programmers to take advantage of multiple cores. Now, to give the INRIA team
   credit, the reason they have for this is a good one- we (programmers) still
   don't know for sure how to write multithreaded code safely (I have my
   suspicions, but I have no proof), and we had even less of an idea a decade
   and a half ago when the fundamentals of Ocaml were being decided. And
   supporting multithreaded code slows down the garbage collector. And, a
   decade and a half ago, multithreaded code was a lot less important, and
   multicore systems were rare and expensive. So this is mainly just Ocaml
   showing it's age. But still, if I were to pick the biggest shortcomming of
   Ocaml, this would be it.

   2. Printf

   You know what? Printf was a bad idea for C- having this special
   domain-specific language in what looks like strings to control formatting.
   It's even worse in Ocaml, where you have to add special kludges to the
   compiler to support it (at least the way Ocaml did it- there are smarter
   ways <http://www.brics.dk/RS/98/12/> that don't require compiler
   kludges). You might think that "%3d" is a string in Ocaml, and sometimes
   you'd be right. But sometimes it's a (int -> '_a, '_b, '_c, '_a) format4.
   This horrid type (which is *not* a string) is necessary to encode the
   types of the arguments printf needs to be passed it.

   The one thing I know of that C++ did better than Ocaml was ditching
   printf. And the idea of iostreams even isn't that bad- except for the fact
   that they encouraged generations of C++ programmers to abuse operator
   overloading (hint to Bjarne Stroustrup: << and >> are not the I/O operators,
   they're the bit shift operators!), and they made the iostream classes
   exceptionally difficult to inherit from or extend. But a combination of C++
   style iostream operators and Java style iostream classes would have worked
   so much better, and not required hacking the compiler. An even better idea
   might be some variant of functional unparsing<http://www.brics.dk/RS/98/12/>

   3. Lack of multi-file modules

   Ocaml has an exceptionally powerfull and flexible module and functor
   system, which stops rather annoyingly at the file level. Files are modules,
   and files (and modules) can contain other modules within them, but you can't
   collect several files into one big multi-file module. You especially can't
   say that certain files, while they may be visible to other modules within
   the multi-file module, aren't visible outside the multi-file module. This is
   especially useful if you want to factor out common base functionality from a
   library of modules into a shared module, but not productize it for export to
   the outside world. This limits the ability of the programmers to correctly
   structure large projects.

   The -for-pack arguments help fix a lot of this, sorta- but they require
   smarts in the build system and generally special .mli files to control
   visibility. It can be done, but it's not clean- it makes the build process a
   lot more complex, and important information about which files are externally
   visible or not is contained somewhere that is not in the source code. This
   could certainly be done a hell of a lot better than it is.

   4. Mutable data

   Mutable data is both a blessing and a curse. It is a blessing when
   learning the language, especially when you're coming from conventional
   imperative programming languages. Rather than having to learn what a lazy
   real time catenable dequeue is, and why you want one, you can just bang out
   a quick mutable doubly linked list and get the same behavior. I would argue
   that if you're coming from the imperative/OO world, it's easier to learn
   Ocaml than Haskell for precisely this reason.

   But this implies that mutable data is training wheels of a sort. And,
   like the fact that training wheels prevent you from going fast, mutable data
   prevents Ocaml from doing a lot of interesting and useful things. Many of
   Haskell's most interesting features- such as STM, easy multithreading, and
   deforrestation, and scrap your boilerplate, arise out of Haskell being
   purely applicative. Mutable data is holding Ocaml back.

   5. Strings

   This is more of a minor nit, but the representation of strings as a
   (compact) array of chars is stupid. It made sense of pointer-arithmetic C,
   not so much for Ocaml. The definition of chars as 8 bits prevents the
   internationalization of Ocaml beyond the European languages, and the
   representation is inefficient. The most important operations on strings are
   either iterating over all the characters in a string, or concatenating two
   strings. Random access and mutation is rare, *unless* you're using the
   string as a bit or byte array, which is encouraged by the nature of strings.
   Bit arrays should be first class types of themselves, with specialized
   internal representations, allowing strings to be optimized as strings. An
   unbalanced tree based structure would allow O(1) (and exceptionally fast)
   string concatenation while preserving the O(N) cost to iterate over all
   characters in a string. Also, strings are currently mutable data, and as
   described above, mutable data is teh sucketh. This is doubly so for strings,
   which can be profitably interned (and thus should be).

   6. Shift-reduce conflicts

   Shift-reduce conflicts happen in parser generators, especially LALR(1)
   parser generators like yacc and ocamlyacc, where there are situations where
   the parser doesn't know if the current expression is complete, or can still
   go on. The golden example of this is the "dangling else" problem of C and
   other languages, where you might see code like this:

       if (test1) then
           if (test2) then

   The problem is that the parser, after it has finished parsing
expr1expression and looking at the
   else, doesn't know wether to shift it (making the else and expr2 part of
   the inner if) or reduce it (making the else part of the outer if).

   I strongly recommend everyone who is developing a language to write the
   initial syntax in a LALR(1) parser, and eliminate all shift-reduce (and
   especially reduce-reduce) conflicts while you still can, because every
   shift-reduce conflict is a bug waiting to happen. In the above example,
   based on indenting, it is very likely to be meant to be part of the outer
   if, but the parser is going to bind it to the wrong if, causing a bug. This
   is slightly less bad in Ocaml, where generally the bug will be a type error,
   but that doesn't mean it's good. And Ocaml has literally dozens of these
   suckers lying in wait to bite and unwary programmer. Even I get bitten by
   shift-reduce conflict generated bugs on a regular basis.

   A quick side digression for those who started yelling that the
   indentation should be significant to the compiler. Consider the output that
   would be caused by printing the following 3 C/Ocaml strings: "\n \tx\n", "\n
   \t x\n", and "\n\t x\n". Each one starts with a newline, ends with an x and
   a newline, and contains exactly two spaces and a tab in between. Now, what
   columns do the three x's end up in? The answer is "it depends upon which
   editor you're using to view the output with, and what configuration that
   editor has". I just spent a large chunk of this afternoon cleaning up a
   bunch of code written by a programmer who merrily mixed spaces and tabs, and
   used a different editor than I do. As a consequence, what looked neatly
   formatted to him looked like gibberish to me. And if two humans can't agree
   on whether a particular piece of code is well formatted or not, what hope
   does the computer have to figure out the situation? The solution is to
   eliminate the root cause of the problem (the shift-reduce conflicts), not
   add a new way to introduce subtle and hard to debug errors.

   7. Not_found

   One of the worst design decisions Ocaml has to have made is Not_found-
   this is the exception that's thrown whenever something isn't found. Tha'ts
   helpfull, isn't it? It's thrown by the Set and Map modules (and all their
   functor applications) the List module, various other data structure
   libraries, various regular expression libraries, and, as a rule, just about
   any random peice of code. And it contains absolutely no information as to
   what is not found, or what was being looked for. Or even what was doing the
   looking. At least Failure and Invalid_argument at least take a string
   argument, which is generally is the name the function that failed. So if you
   see the exception Invalid_argument("hd"), you know at least to be looking at
   calls to List.hd (or to other functions named "hd"). But Not_found? This is
   the sound of your hd hitting the desk, repeatedly.
   8. Exceptions

   And while I'm ragging on Not_found, let me rag on exceptions a bit.
   Exceptions are used way too commonly in Ocaml- OK, I'll give Failure,
   Invalid_argument, and Assertion_failure a pass, but basically Not_found and
   End_of_file should never, *ever*, be thrown. Return 'a option instead.

   This has non-trivial implications. For example, take the classic newbie
   ocaml "mistake"- write a function that reads in a file and returns the file
   as a list of strings, one string per line. Should be easy. We use input_line
   as to read a line, and write our code like:

   let read_file desc =
       let rec loop accum =
                let line = input desc in
                loop (line :: accum)
            | End_of_file -> List.rev accum
       in loop []

   And they can't understand why this code blows up when they try to read a
   file with more than 30,000 lines in it. There are standard solutions to
   this- but they make the code uglier, and by far the best is to wrap the
   exception throwing code in an expression that turns the exception into an
   option, like:

   let read_file desc =
       let rec loop accum =
            let line =
                   Some(input desc)
               | End_of_file -> None
           match line with
           | Some s -> loop (s :: accum)
           | None -> List.rev accum
       in loop []

   At which point you really have to raise the question of why input_string
   didn't just return string_option from the get-go.

   The other popular alternative is to use mutable data. Which is just
   trading off one problem set for another.

   There is, by the way, an interesting idea for expetion handling that's
   been raised recently (see
   here <http://portal.acm.org/citation.cfm?id=1022471.1022475>). The basic
   idea is that the current Ocaml syntax for declaring an exception handler is:

       | Exception -> expr2

   where if expr1 doesn't throw an exception, the value of the hole
   expression is the value returned by expr1- otherwise, if it throws
   Exception, then the whole expression has the value expr2. This syntax is
   replaced by:

       try var = expr1 in
       | Exception -> expr3

   Here, if expr1 does not throw an exception, then the value of the whole
   expression is that of expr2 with variable var having the value of expr1,
   while if it throws Exception, then the value of the whole expression is
   then expr3. Note that exceptions are only caught in expr1- it's just that
   if an exception is caught, then expr2 is skipped as well. But this means
   that calls from within expr2 can be tail calls. Let's take a look at our
   read_file program, the newbie way, a third time, this time with our new
   exception syntax:

   let read_file desc =
       let rec loop accum =
           try line = input_line desc in
           loop (line :: accum)
           with | End_of_file -> List.rev accum
       loop []

   Since the tail call to loop is outside where we catch exceptions, it's a
   true tail call, and the function works "as expected". And the fact that
   we're using exceptions instead of options isn't that big of a deal anymore
   (Not_found still sucks, due to it's simple ubiquity and taciturnity). If
   someone with more time and/or camlp4 knowledge than I have wants to write
   this up as an extension, I'd be eternally gratefull.

   While I'm at it, would it be possible to have exceptions in the type
   system? I know Java tried this and everyone hated it, but Ocaml has two
   things that Java didn't- type inference and type variables. The type
   variables are nice because they allow you to deal with "unknown" exceptions-
   for example you could have a function of type (int -> int throws 'a) ->
   int throws 'a, meaning that you don't know (or care) what exceptions the
   passed-in funciton might throw, but that you throw the same exceptions
   (presumably by calling the passed-in function). Type inference also means
   you don't have to declare most of these cases, the compiler can infer what
   exceptions are thrown. This level of changes to the type system are highly
   unlikely, but a boy can dream, can't he?
   9. Deforrestation

   This really should be under the "mutable data is the sucketh" section,
   but oh well. Haskell has introduced a very interesting and (to my knowledge)
   unique layer of optimization, called deforrestation. Basically, what happens
   if that the ghc compiler recognizes and can optimize certain common
   sub-optimal data structure transformations. For example, it can convert map
   f (map g lst) into map (f . g) lst, where the peroid (.) is the function
   composition operator. There are a couple of obvious advantages to being able
   to do this- for example, by doing so we've eliminated an intermediate data
   structure, and created new opportunities for optimizing the combined f and g

   But there are two things of serious interest to me about this. First of
   all, this is the first time I've seen optimizations at this level being this
   easily performed. Companies like Intel and IBM have thrown literally
   man-millenia at this problem, and the solutions they've come up with were
   limited and fragile (slight changes to the code would enable or disable the
   optimization). And yet the Haskell people implemented in one Simon Peyton
   Jones long weekend (also known as a couple of man months for mere mortals
   like you and I). The reason for this is that the real difficult is not
   actually implementing the transformation, or even detecting that the
   transformation might be applicable, it's proving that the transformation is
   correct, that the code after the transformation is applied behaves
   identically to the code before the transformation is applied. In langauges
   with mutable data, like Fortran and C++ (and Ocaml), this is decidedly
   non-trivial. In Haskell, you can dash off the proof that it's always correct
   on the back of a cocktail napkin- proving that it's correct in the general
   case for all lists and all functions. And that thus the compiler doesn't
   even need to check- once it detects that the pattern can be applied, it can
   just go ahead and apply it. Haskell is doing data structure level
   optimizations with the ease that most other compiler do peephole instruction
   optimization. This is a non-trivial result.

   The second important aspect of this is that it changes the concept of
   what optimization is, or should be. I forget which paper it was I was
   reading that said that optimization should really be called depessimization.
   That the programmer wants to introduce pessimizations- the programmer could
   do the above deforrestation himself, except that to do so would make
   maintainance more difficult, as it'd break module boundaries, or requires
   knowledge of how a particular module is implemented, or simply requires
   knowledge of widely seperated peices of code. The goal of the compiler,
   rather than striving to produce some "optimal" (and thus impossible to
   obtain) implementation, but simple to undo the pessimizations that the
   programmer has introduced in the name of maintainablity, modularity,
   readability, and/or simplicity. The programmer shouldn't have to worry about
   creating intermediate data structures, and should worry about corrupting his
   code in the name of performance- that's the compilers job (don't break your
   code, let the computer do that for you! :-).

   Needless to say, between the mutable data, the side effects, and the
   handling of exceptions, Ocaml isn't going get deforrestation any time soon.

   10. Limited standard library

   If writing monad tutorials is the cottage industry of Haskell
   programmers, than rewriting the standard library is Ocaml's cottage
   industry. You almost can't call yourself an Ocaml programmer if you haven't
   rewritten a goodly chunk of the standard library at least once. This is
   because every Ocaml programmer has a long list of functions that should
   (they think) be in the standard library but just aren't. The big ones for me
   are the lack of Int and Float modules ready for use by the Set and Map
   functors, the lack of an already-instantiated Map and Set modules for
   Strings, Ints, and Floats, and the lack of a list to set/map function. None
   of these are exactly hard to do, just annoying to have to constantly redo.

   Of course, the problem with this is the range and design patterns of
   Ocaml programmers- especially with comparison to languages like Java and
   C++, or even Ruby. The design patterns of a top-5% Java programmer is not
   all that different from the design philosophy of a bottom-30%'er. There may
   be differences in precisely what features are supported, and what names
   things are given may change, but the broad stroke designs will be similiar.
   The design patterns of Brian Hurt circa 2008 are radically different than
   the design patterns of Brian Hurt circa 2003. For example, were I to redo
   extlib now, it's be purely applicative, heavily functorized, lots of monads,
   and a fair bit of lazy evaluation, as opposed to the code I wrote in 2003.

   11. Slow lazy

   Speaking of lazy evaluation- Ocaml's built-in lazy evaluation is wicked
   slow. I've actually timed it, and it's like 130+ clock cycles of overhead to
   force a lazy value the first time, and over 70 clock cycles of overhead to
   force it the second and subsequent times (when all it has to do is return
   the cached value). Given that most of the good uses of lazy evaluation has
   it wrapping computations that are like 10 clock cycles long or so, the
   overhead of the lazy thunk dominates. In this era of Ruby programmers, this
   may not seem like much- but this overhead discourages Ocaml programmers from
   using lazy evaluation. Which is a shame, as anyone who has read Okasaki
   knows how frimping usefull it is.

Well, that's my list so far- subject to revision without notice. The vast
majority of them qualify as picking nits- and they are. If you can damn with
faint praise, then you should also be able to praise with faint damning- and
this certainly qualifies as that. Most of the problems here only became
apparent (or their solutions became apparent) long after Ocaml was mature
and set. For example, it's only be recently that multi-threading has been a
big deal. And the usefullness of monads and lazy evaluation for functional
programming weren't understood until the late nineties/early 21st, a decade
after Ocaml's formative years. So this whole post mostly amounts to whining
that the Ocaml developers weren't *even more insightfull and
foresightfull*than they were. And, compared to the major revisions
Ocaml's "age-mates"
Java and Perl have gone through, it's stood up well with the passing of

No lanuage is perfect, including Ocaml. And to continue to improve we must
understand what has gone before- both what worked, and what didn't.

Popularity: 6% [?<http://alexking.org/projects/wordpress/popularity-contest>
### End Article ###

On Thu, May 8, 2008 at 2:27 PM, Andrew Coppin <andrewcoppin at btinternet.com>

> Chad Scherrer wrote:
>> PS - the link now gives a "500 server error" - did our traffic overwhelm
>> it? Is
>> the cafe the next slashdot? ;)
> Hell, I can't even get a TCP SYN-ACK packet out of it... :-/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20080508/b50b6f50/attachment.htm

More information about the Haskell-Cafe mailing list