[Haskell-cafe] Proposal: Non-recursive let

Thu Jul 25 08:18:52 CEST 2013

ivan.chollet wrote:
> let's consider the following:
>
> let fd = Unix.open ...
> let fd = Unix.open ...
>
> At this point one file descriptor cannot be closed. Static analysis will
> have trouble catching these bugs, so do humans.

Both sentences express false propositions.

The given code, if Haskell, does not open any file descriptors, so
there is nothing to close. In the following OCaml code

        let fd = open_in "/tmp/a" in
        let fd = open_in "/tmp/v" in
        ...

the first open channel becomes unreachable. When GC collects it (which
will happen fairly soon, on a minor collection, because the channel
died young), GC will finalize the channel and close its file
descriptor.

The corresponding Haskell code
        do
        h <- openFile ...
        h <- openFile ...

works similarly to OCaml. Closing file handles upon GC is codified in
the Haskell report because Lazy IO crucially depends on such behavior.

If one is interested in statically tracking open file descriptors and
making sure they are closed promptly, one could read large literature
on this topic. Google search for monadic regions should be a good
start. Some of the approaches are implemented and used in Haskell.

Now about static analysis. Liveness analysis has no problem whatsoever
determining that a variable fd in our examples has been shadowed and
the corresponding value is dead. We are all familiar with liveness
analysis -- it's the one responsible for `unused variable'
warnings. The analysis is useful for many other things (e.g., if it
determines that a created value dies within the function activation,
the value could be allocated on stack rather than on heap.). Here is
example from C:

#include <stdio.h>

void foo(void) {
  char x[4]  = "abc"; /* Intentional copying! */
  {
  char x[4]  = "cde"; /* Intentional copying and shadowing */
  x[0] = 'x';
  printf("result %s\n",x);
  }
}

Pretty old GCC (4.2.1) had no trouble detecting the shadowing. With
the optimization flag -O4, GCC acted on this knowledge. The generated
assembly code reveals no traces of the string "abc", not even in the
.rodata section of the code. The compiler determined the string is
really unused and did not bother even compiling it in.

> Disallowing variable shadowing prevents this.
> The two "fd" occur in different contexts and should have different names.
> Usage of shadowing is generally bad practice. It is error-prone. Hides
> obnoxious bugs like file descriptors leaks.
> The correct way is to give different variables that appear in different
> contexts a different name, although this is arguably less convenient and
> more verbose.

CS would be better as science if we refrain from passing our
personal opinions and prejudices as ``the correct way''.

I can't say better than the user Kranar in a recent discussion on a
similar `hot topic':

    The issue is that much of what we do as developers is simply based on
    anecdotal evidence, or statements made by so called "evangelicals" who
    blog about best practices and people believe them because of how
    articulate they are or the cache and prestige that the person carries.
    ...
    It's unfortunate that computer science is still advancing the same way
    medicine advanced with witch doctors, by simply trusting the wisest
    and oldest of the witch doctors without any actual empirical data,
    without any evidence, just based on the reputation and overall
    charisma or influence of certain bloggers or "big names" in the field.

http://www.reddit.com/r/programming/comments/1iyp6v/is_there_a_really_an_empirical_difference_between/cb9mf6f