[Haskell-cafe] Typing in haskell and mathematics

Mon Jan 31 11:48:00 EST 2005

On Fri, 28 Jan 2005, Jacques Carette wrote:

> Henning Thielemann <lemming at henning-thielemann.de> wrote:
> 
>> This seems to be related to what I wrote yesterday
>>  http://www.haskell.org/pipermail/haskell-cafe/2005-January/008893.html
> 
> Yes, very much.  Except that rather than trying to tell mathematicians 
> that they are *wrong*, I am trying to see which of their notations I can 
> explain (in a typed way).  There will be some 'leftovers', where the 
> notation is simply bad.

I know that's problematically to say that a notation is "wrong" since
no-one is authorized to say, what's "right". Though, if notation is
contradictory, I don't know what's wrong with calling it wrong.

 Let me motivate a bit more, why I'm interested in mathematical notation.  
I (and certainly every mathematician) learnt that for instance equations
are a powerful tool to solve problems. You can model many problems with a
mathematical equation, you can transform the equation using some accepted
rules and eventually you get some simplified equation like 'x=2'.  The
problem is abstracted, you do the transformations without thinking about
an interpretation in the real world. You know that you can trust the
rules. If someone shows you that the simplified equation is wrong you know
that you have applied some rule in the wrong way or you invented an
improper rule.
 There are many tools in the past which have been introduced to make
mathematics more convenient, more safe, somehow better. Arabian numbers
almost replaced Roman numbers because they are more convenient for
computations. Variables and formulas were introduced as a more precise
alternative to phrases in natural languages. Axioms replaced intuitive
definitions. But there are still issues which are unsatisfying. E.g. on
the one hand calculus and functional analysis "degrades" functions to
ordinary objects of mathematics, thus reaching a new level of complexity,
but on the other hand the notation is not as properly developed as the
theory itself, in my opinion.

Let me give an example:
 The teacher says, solving a simple linear equation is rather easy by
using separation of variables. Then he transforms a differential equation
formally by some rules he don't explain in detail and I'm curios whether
there _is_ a good explanation:

y' = y
y' / y = 1      |  integrate
C + ln y = x
ln y = x-C
y = exp (x-C)

The student thinks: That's nice, let's try it with some problem I couldn't 
solve recently:

 cos x = x^2        | differentiate
-sin x = 2*x

So, x=0 is an obvious solution to the second equation but not to the first
one.  The student concludes: If the teacher applies the method, it's
right, if I apply it, it's wrong. :-( Then he asks the teacher why the
method fails for cos x = x^2 . The teacher says, of course what you did
cannot work because y' = y is a differential equation and cos x = x^2 is
not. Does this answer help the student? Why doesn't the formula show this
difference? What means the difference between 'differential equation' and
'normal equation' essentially?

 The technique of differentation led to lots of notations. I think this is 
so because many of the existing notations don't model the idea of 
differentation very well, thus new notations are introduced when the known 
notations fail.
  d x, \partial x, ', \dot{x}, f_x, D
 I wonder if these notations can be unified, thus simplified. I wonder
what formalism I can trust. E.g. I'm sceptic about notations using
variables. What is the value of d x for certain values of x? What is the
difference between d x and d y? Is f_x(x,y) and f_x(y,x) the same?
 My proposal: Consider the derivation as higher order function which maps 
a function to a function. So in the Haskell notation it would be

derive :: (a -> a) -> (a -> a)

More generally if we have functions (b -> a) as vectors and want to derive
a function mapping from vector arguments to scalar values ((b -> a) -> a)  
(also called functionals), it would have signature

deriveVector :: ((b -> a) -> a) -> ((b -> a) -> (b -> a))

E.g.  deriveVector f x 1 evaluates the partial derivative of f with
respect to the first component at the point x. deriveVector f x is the
total derivative or gradient at point x. I know that after introducing
higher dimensional differentation using \partial x, \partial y, d x and d
y students are uncertain about what the total derivative actually is.

Let's turn back to the example and view it through the functional glasses. 
The differential equation

y' = y

now means, that y is a function, (') is the differentation operator.  
With liftF2 I lift (/) to function values and I divide both sides with the
lifted (/) by y, noting that this is only possible, if y is nowhere zero.

            derive y    = y
liftF2 (/) (derive y) y = liftF2 (/) y y
liftF2 (/) (derive y) y = const 1              |  integrate c
   (integrate fulfills
       fint = integrate c f  =>  fint 0 = c,
    though it is problematic if the result has a singularity at 0)
 (c1+) . ln . y = (c+) . id      (with c1 such that ln (y 0) = c)
         ln . y = ((c-c1)+)            |  (exp .)
   exp . ln . y = exp . ((c-c1)+)
              y = exp . ((c-c1)+)

This is certainly not intuitive if you are familar with the common 
notation but it is a method where I can explain every step, it is a 
formalism I can trust. And it is important to have such formalism at hand 
if you encounter a problem with the common abbreviated mathematical style.

Now it is clear, why   cos x = x^2   can't be solved using differentation: 
This is an equation of scalar values, whereas differential equations are 
function equations.

>> I've collected some examples of abuse and bad mathematical notation:
>>  http://www.math.uni-bremen.de/~thielema/Research/notation.pdf
> 
> Some of what you point out there is bad notation.  Other bits point to 
> some misunderstanding of the issues.
> 
> Starting from your original post:
> 
>> f(x) \in L(\R)
>>    where f \in L(\R) is meant
>>
>> F(x) = \int f(x) \dif x
>>    where x shouldn't be visible outside the integral
> 
> 
> First, mathematicians like to write f(x) to indicate clearly that they 
> are denoting a function.

f(x) \in L(\R)  clearly denotes that f(x) is a function, that is f(x)(y)  
is a scalar value. There are situations where it is sensible to claim f(x)
\in L(\R)  so why making them more complicated by sticking to that
notation in the wrong cases?
  f \in L(\R)  is one of the examples where the _right_ notation is
shorter, thus easier to read, than the _wrong_ one. Yes, I call f(x) \in
L(\R)  _wrong_, because it is commonly agreed that L(\R) is a set of
functions and that each element of such a set is a function, thus if you
write f(x) \in L(\R) then you cannot treat f(x) as a scalar value later
on.

> You also write
> 
>> O(n)
>>    which should be O(\n -> n) (a remark by Simon Thompson in
>>                                The Craft of Functional Programming)
> 
> but the only reason for this is that computer scientists don't like open 
> terms.  Since the argument to O must be a univariate function with range 
> the Reals, then whatever is written there must *denote* such a 
> function.  The term ``n'' can only denote only one such function, \n -> 
> n.

What about (\m -> n) ?

What means O(n^m) ?   O(\n -> n^m) or O(\m -> n^m) ?

> Russell's work in the 
> early 1900, "sense and denotation" are worth reading if you want to 
> learn more about this.

Nice to know.

> On your ``notation.pdf'' (link above), you have some other 
> mis-interpretations.  On p.10 you seem to think that Mathematica is a 
> lazy language, when it is in fact an eager language.

I know that Mathematica is both eager and a mix of functional and 
imperative programming.

> However,
> 
>> subs(x=2,'diff'(ln(x),x)) ;
> 
>       diff(ln(2),2)
> where the '' quotes mean to delay evaluation of the underlying 
> function.  On the other hand
> 
>> eval('diff'(ln(x),x),x=2) ;
> 
>       eval('diff'(ln(x),x),x=2)
> because it makes no sense to evaluate an open term which introduces a 
> (temporary) binding for one of its variables.

That's what I dislike: To overcome particular problems, patches like
quotes and eval are introduced. They are rather hard to combine, hard to
nest. I spent hours to explain Mathematica expressions that I had clearly
in my mind. We can have a very clean solution without quotes and subs and
eval by just considering functions as regular mathematical objects.

> Note that without access to terms, it is not possible to write a 
> function like diff (or derive as you phrase it, or D as Mathematica 
> calls it).  Mathematician's diff looks like it is has signature diff: 
> function -> function, but it in fact is treated more often as having 
> signature diff: function_denotation -> function_denotation.

 Fine, let's consider this example in detail: If I write + in Mathematica,
I know that the value of the resulting expression will be the sum of the
values of the expressions of the operands. This is true independent from
the particular expressions of the operands. This constraint makes working
with algebraic formulas very handy. If I write f(4) or f(2+2) in
mathematics doesn't matter, both denote the same value independent of f
because 4 and 2+2 denote the same value. If x=4 then f(x) denotes the same
value as f(4). In Haskell this constraint is fulfilled and it is essential
for the simplification rules that GHC applies.
 I like to have this behaviour for derivation, too. So of what type must
be the parameter of 'derive'? A scalar expression with a free variable or
a function expression? The designers of Mathematica and Maple decided for
the first alternative. But the derivation function D breaks the constraint
of free combineability. If x=4 then D[x,x] is 0 or an error (I can't check
it currently), if x is not bound to a value it is 1. What I suggest is
that derivation should be limitted to functions. If two expressions denote
the same function, then D will derive two expressions, that may not be
equal but denote the same function.  Indeed, the implementation of D must
know the expression of the function to give the result as a formula, but
if it only knows the set of argument-value-pairs it would work, too. 
This is the view on functions which Fourier brought to us. Before Fourier 
only functions given as analytical terms were considered as functions.
 The definition of the differential quotient don't need the particular
expression for a function, it doesn't even require that there is an
explicit expression. This means, the derived function will depend only on
the function to be derived! Following this paradigm, Simplify must
implement the identity function (identity with respect to the value, not
identity with respect to the expression) and ReplaceAll should be replaced
by function application.

> You also seem to assume that set theory is the only axiomatization of 
> mathematics that counts (on p.31).

I'm also aware of Russell's type theory, but my impression is that set 
theory is the most popular one.

> I do not see functions A -> B as 
> somehow being subsets of powerset(A x B).  That to me is one possible 
> 'implementation' of functions.  This identification is just as faulty as 
> the one you point out on p.14 of the naturals not ``really'' being a 
> subset of the rationals.  In both cases, there is a natural embedding 
> taking place, but it is not the identity.

I agree, though I'm not aware of some symbols to express that. :-(

> You also have the signature of a number of functions not-quite-right.  
> ``indefinite'' integration does not map functions to functions, but 
> functions to equivalence classes of functions.

Yes, I should add a parameter for the integration constant. An explicit 
integration constant seems to be the better choice instead of a set value, 
since then it is clear that for every real number there is one function.

> Fourier transforms (and other integral transforms) map into functionals, 
> not functions.

The Fourier transform I had in mind is defined as

F(w) = \int_{\R} e^{-i w x} f(x) d x 

So F is clearly a function, though I agree that the Fourier transform 
defined this way is not a total function.

> I hope your audience (for things like slide 35) was made of computer 
> scientists - it is so amazingly condescending to thousands of 
> mathematicians, it is amazing you would not get immediately booted out!

The audience were exclusively mathematicians - my colleagues. Being booted
out by people who never thought intensively about what they write but who
only defend their habit, wouldn't be a problem for me. The same colleagues
ran into troubles before by trusting formalisms which are odd, though I
believe they did not often enough, so far. But I can wait ... :-)

> On p.37, you have polynomials backwards.  Polynomials are formal 
> objects, but they (uniquely) denote a function.  So polynomials CANNOT 
> be evaluated, but they denote a unique function which can.

Yes, I should better stick to the application homomorphism \varphi which 
maps a polynomial to a function. That is p is a polynomial, \varphi(p) is 
a function and \varphi(p)(x) is a scalar value.

> On p.51 where you speak of ``hidden'' quantifiers, you yourself omit the 
> thereexists quantifiers that are implicit on the last 2 lines -- why?

What do you mean exactly? In the mean-time I found that is better not to
speak about equations but about the set of solutions. That is writing, or
at least thinking,

{x : 0 = x^2 + 2*x + 1}

and

{y : forall x . y'(x) = x + sin x}

makes clear, what we search for and it explains, what care has to be taken
for every equation transformation.

> The above was meant to be constructive

Thanks for the comments!