newbie conceptual question [from haskell list]

D. Tweed tweed@compsci.bristol.ac.uk
Thu, 26 Jul 2001 11:39:36 +0100 (BST)


On Thu, 26 Jul 2001, Frank Atanassow wrote:

> > Again, as a C++ programmer I have some grasp of what program
> > rearrangements are valid (E.g.,I can't move an assignment involving an
> > expression in another variable, say v, from before an assignment to v to
> > after an assignment to v), and I use it implicitly every time I think
> > about structuring or rearranging my programs. The problem is that that
> > doesn't really help because determining whether there's an assignment to
> > variable v between where an assignment is now and where I want to move it
> > to is `decidedly non-trivial'.
> 
> I think doing source-level reasoning on C++ programs is virtually
> impossible. I always have to think about the implementation in terms of
> memory blocks and vtables and pointers.
> 
> One reason must be that assignment is used all over the place when local
> let-binding would suffice. If C++ actually enforced encapsulation of state,
> that would not be so bad, but in practice you never know what sort of
> aliasing and sharing is going on.

Note: I'm trying to compare like-with-like, so when I talk about reasoning
below it's in the context of the same sort of semi-formal, in my head
reasoning that I do in my everyday development of C++ or Haskell.

Yes, I guess it's time for a confession: I'm making a rather sweeping
assumption that the patterns in which I do and don't program are in some
way `average' or `typical', even though they probably aren't. For
instance, I don't even use patterns like `a[b++]=c;' just because it
requires too many `mental cycles' for me to figure out what's going on
when I reread the code; most other C programmers probably would use that
construct, I expect. Having said that, within this sort of style I find
that I don't really (have to) go beyond the idea of a machine with a store
of named variables, some of which are actually arrays or have other
structure, and tend to use the simple rule `if you assign/read outside
array bounds or to a pointer location which does not map within a store
defined variable/variable array, the program may (in a way which is so
dependent on history as to be effectively non-determinisitic) either place
gibberish in any other stored variable cell, return gibberish, return
sensible values and have no observable ill effects or seg-fault'. Whilst
you're certainly right to say that a proper semantic model (even an
informal one) ought to incorporate the differences between stack allocated
variables and heap allocated variables I genuinely don't think I use that
distinction whilst thinking about my programs because it's too
complicated for me. Likewise I've had one bug in 5 years due to a vtable
problem and that was when I knew I was pushing my luck trying to do
a really low-level optimisation for efficiency. The bugs that I have tend
to fall into three big categories: incorrect algorithms (i.e., I'd
misunderstood the problem even at a high level), writing outside array
bounds and misaccounting for the assignments and their ordering.
Class 1 is a problem even when I write in Haskell, classes 2 and 3 tend to
be analysed by sticking print outputs everywhere, and then reasoning using
my very simple mental model to figure out why the observed behaviour is
happening and fix it. 

The reason that I'm belabouring this point is that I don't think that
imperative programmers are going to be convinced by an argument that
`functional programming is more effective because there are a huge number
of complexities in the semantics, and even some downright
ill-definedness/inconsistency, compared to functional programming' because
even my natural response is `that's entirely true about the semantics but,
_given that those problems all tend to lie in areas of the "program state
space" where I don't write programs_, why should that motivate me to use a
functional language?' I still think the compelling argument is not that 
there are all these nasty tar-pits of complexity in the semantics of an
imperative language, say C++, that aren't there in functional languages
but that, in the code which makes up 99.99% of imperative programs
and where most of the bugs occur, there are simple semantics you just
can't use them in any effective way. In contrast you _can_ do effective
reasoning about the same sorts of thing in functional languages because
their roughly equally simple semantics are more effective.

> Actually, I have to admit that when I think about Haskell programs, I don't
> really reason using Haskell's semantics, but rather the equational theory of
> a polymorphic lambda-calculus, and I hardly ever think about the operational
> side. If you idealized C in the same way, you would probably ignore things
> like pointers and aliasing which cause problems.

Umm, as I stated above, I think I use a very simple model that lets me
think about aliasing and pointers. I guess the key point is that I don't
try and reason about what goes wrong in detail when `undefined pointer
access occurs', merely that all sorts of nasty things can happen
unpredictably.

> First, this is not what I mean by "implementation-defined" (my fault). What
> I mean is the model of computation: in C, you are always thinking about
> blocks on the heap, pointers between them, whether something is on the stack
> or not, what order things are happening in and what order this other code
> you wrote last year assumes them to be happening in, etc. So you are still
> thinking, essentially, at the level of the von Neumann architecture.

Again I agree in principle but I don't see that reasoning using a
dentotational semantics for Haskell is less `implementation defined' than
thinking about a C program in terms of local stores and sequence
points; arguably when you're using your model of Haskell you're thinking
at the level of a non-deterministic-rewriting-machine architecture :-) .

>  Sure,
> some features which affect portability are hidden away, e.g., how long a
> word of memory is, but those are details, and you can name things and refer
> to them more nicely, but it's still just assembly programming in disguise.
> In an FP language, and more generally languages which support an abstract
> semantics, I don't have to think about those things. Instead, I can think
> about a problem at a whole different level of abstraction which is closer to
> the problem domain.

Again: agree in principle; in practice 


> Third, I don't think it is true that there are many (any?) compilers which
> meet the C++ specification to a reasonable degree. C++ must be the single
> worst implemented language in computer history.

If you assume that, in any given languatge, all programs in that language
are equally likely to be written then I'd agree that C++ would easily give
you the highest ratio of rejected/implementation defined programs of any
language when you consider a random sample of programs over this
distribution. What I think is a more interesting question is: I'm a
(cautious) human being who doesn't produce programs from all over the
program state space but clustered in certain areas that are both useful
and more understandable. That being the case, is the above ratio now
significantly higher than for other languages? In my (very limited
experience) it doesn't seem that much higher.

I still think trying to `sell' FP on the basis of huge problems with
imperative languages that largely don't affect programmers because they
don't write programs that involve these problems is a mistake; it's the
fact that even in the nice, well defined area of imperative program where
most programmers live things are much more complicated than in a
functional language that makes more sense.  

___cheers,_dave________________________________________________________
www.cs.bris.ac.uk/~tweed/pi.htm |tweed's law:  however many computers
email: tweed@cs.bris.ac.uk      |   you have, half your time is spent
work tel: (0117) 954-5250       |   waiting for compilations to finish.