[Haskell-cafe] data analysis question

Thu Nov 13 06:05:00 UTC 2014

On 13/11/2014, at 3:52 pm, Brandon Allbery <allbery.b at gmail.com> wrote:
> 
> It is an open source implementation of S ( http://en.wikipedia.org/wiki/S_(programming_language) ) which was developed specifically for statistical applications. I would wonder how much of *that* was shaped by Fortran statistical packages….

The prehistoric version of S *was* a Fortran statistical package.
While the inventors of S were familiar with GLIM, GENSTAT, SPSS, SAS, BMDP, MINITAB, &c.
they _were_ at Bell Labs, and so the language looks a lot like C.
Indeed, several aspects of S were shaped by UNIX, in particular the way S (but not R)
treats the current directory as an “outer block”.
Many (even new) R packages are wrappers around Fortran code.

However, that has had almost no influence on the language itself.
In particular:

 - arrays are immutable
   > (v <- 1:5)
   > w <- v
   > w[3] <- 33
   > w
   [1]  1  2 33  4  5
   > v
   [1] 1 2 3 4 5

 - functions are first class values and higher
   order functions are commonplace

 - function arguments are evaluated lazily

 - good style does *NOT* “traverse arrays by indexes”
   but operates on whole arrays in APL/Fortran 90 style.
   For example, you do not do
       for (i in 1:m) for (j in 1:n) r[i,j] <- f(v[i], w[j])
   but
       r <- outer(v, w, f)
   If you _do_ “express data transformations and queries
   functionally in R” — which I repeat is native good style —
   it will perform well; if you “traverse arrays by indexes”
   you will wish you hadn’t.  This is not something that
   Fortran 66 or Fortran 77 would have taught anyone.

Let me put it this way: R is about as close to a functional
language as you can get without actually being one.
(The implementors of R consciously adopted implementation
techniques from Scheme.)