newbie conceptual question [from haskell list]

Frank Atanassow franka@cs.uu.nl
Thu, 26 Jul 2001 10:06:06 +0200


David Tweed wrote:
> I'd like to respectfully disagree with some of this :-)

I figured someone would. Though I thought it would be Fergus. :)

> > The most important thing about functional languages is that we know
> > _what they are_, and that they can be characterized in a regular
> > fashion which is amenable to analysis; in other words, we know how to
> > reason about them, and what sorts of properties functional programs
> >  satisfy.
>
> Umm, I wouldn't agree with that exactly. If you ignore stupidities like
> specifying the mod operator % to be implementation defined, I would claim
> that C has/can be given semantics which are virtually as simple as
> Haskell. Equally I'd argue that we know as much about how to reason about
> imperative languages as we do functional ones. The point is that those
> semantics don't split the problem of reasoning about what's going on into
> nice independent subproblems like the semantics of Haskell do. Equally,
> I'd agree we don't know how to perform _effective_ reasoning on them,
> probably because it's pretty much impossible without restricting the set
> of allowable programs.

Effectively reason, OK. This is what I meant when I wrote "amenable to
analysis."

> Again, as a C++ programmer I have some grasp of what program
> rearrangements are valid (E.g.,I can't move an assignment involving an
> expression in another variable, say v, from before an assignment to v to
> after an assignment to v), and I use it implicitly every time I think
> about structuring or rearranging my programs. The problem is that that
> doesn't really help because determining whether there's an assignment to
> variable v between where an assignment is now and where I want to move it
> to is `decidedly non-trivial'.

I think doing source-level reasoning on C++ programs is virtually
impossible. I always have to think about the implementation in terms of
memory blocks and vtables and pointers.

One reason must be that assignment is used all over the place when local
let-binding would suffice. If C++ actually enforced encapsulation of state,
that would not be so bad, but in practice you never know what sort of
aliasing and sharing is going on.

> > Of course, this argument holds for any "simple" language, not just
> > functional ones, where by "simple" I mean "easy to reason about". For
>
> As I say `easy to _effectively_ reason about' =/= simple semantics.
> I'm making this point because the natural rejoinder from a C programmer
> being told that C (as an imperative language) has more complex semantics
> than Haskell is to say `no it doesn't', which I agree with, the point
> being that Haskell has a semantics which makes it easier to reason about
> effectively. (I'm glossing over the distinction between the various
> kinds of semantics for a language such as denotational and operational
> here.)

Again, this is what I meant by "simple".

Actually, I have to admit that when I think about Haskell programs, I don't
really reason using Haskell's semantics, but rather the equational theory of
a polymorphic lambda-calculus, and I hardly ever think about the operational
side. If you idealized C in the same way, you would probably ignore things
like pointers and aliasing which cause problems.

But the thing is that in Haskell I can pretty much get away with this
idealization and still write interesting programs, but in C you really need
these complications to do useful things. So I think FP languages have a
better separation of concerns.

> > example, there are concurrent calculi of processes, object calculi and
> > first-order algebraic languages which also share this property,
> and I prefer
> > any of them to warty conventional languages. (And this is why I
> also like
> > SML: even though it's not referentially transparent like Haskell, I know
> > what sorts of properties it satisfies.)
>
> Now this to me is a really interesting statement. IIRC from many years ago
> theres not necessarily an indication in the type system when a function
> involves references. Do you actually reason (albeit subconciously) using a
> full semantics which includes possible effects due to aliasing of
> references or changes to their contents somewhere deep in the functions,
> or do you use a rule `well this is 99% likely to be referentially
> transparent so I'll assume it is, but I'll keep in the back of my mind the
> possibility this assumption may be wrong'? (That's what I'd end up doing I
> think.)

Frankly, I do my best to avoid side-effecting code in ML. The fact that you
can do this and still write useful programs is, again, an indication that
the separation of concerns is better than in C, where it is pretty much
impossible to go without side-effects and aliasing.

But when I have to use effects, I guess I make conservative judgements about
substitutability and try to localize them as much as possible. I also have
started thinking about it "monadically" to some extent, so, yes, I guess I
am doing a bit of informal extra type inference in my head.

Finally, ML is not completely oblivious to effects; for example, to preserve
soundness, it disallows generalizing type variables in certain places which
fail a syntactic condition, and that is something you can use yourself to
figure out which expressions are likely to be impure.

> > Haskell (and ML) really is different. First, it's not
> > implementation-defined.
>
> Umm, again I'd say this is debatable: the volume of e-mail to/from Simon
> PJ about the haskell 98 report even though there are various Haskell 98
> interpreters/compilers out there suggests this isn't true. Likewise, C and
> C++ have specifications which are met to a reasonable degree by many
> compilers out there, which prompts the question: when was the last time a
> bug in your code in an imperative language was due to an
> implmentation-defined part of the language? In 5 years of intensive C++
> programming I'd guess I've made maybe 25 bugs due to this, but the number
> of bugs I've fixed in my code must be in the tens of thousands.

First, this is not what I mean by "implementation-defined" (my fault). What
I mean is the model of computation: in C, you are always thinking about
blocks on the heap, pointers between them, whether something is on the stack
or not, what order things are happening in and what order this other code
you wrote last year assumes them to be happening in, etc. So you are still
thinking, essentially, at the level of the von Neumann architecture. Sure,
some features which affect portability are hidden away, e.g., how long a
word of memory is, but those are details, and you can name things and refer
to them more nicely, but it's still just assembly programming in disguise.
In an FP language, and more generally languages which support an abstract
semantics, I don't have to think about those things. Instead, I can think
about a problem at a whole different level of abstraction which is closer to
the problem domain.

Second, I am aware that this is an oversimplification and that C/C++ are not
completely implementation-defined, and you can do some reasoning about
sequence points and local stores and so on. But does that happen in
practice? They just aren't taught that way, and that's also part of the
problem.

Third, I don't think it is true that there are many (any?) compilers which
meet the C++ specification to a reasonable degree. C++ must be the single
worst implemented language in computer history.

---
Frank Atanassow, Information & Computing Sciences, Utrecht University
Padualaan 14, PO Box 80.089, 3508TB Utrecht, The Netherlands
Tel +31 (0)30 253-3261 Fax +31 (0)30 251-3791