[Haskell-cafe] data analysis question

Thu Nov 13 05:44:54 UTC 2014

On 13/11/2014, at 3:21 am, Peter Simons <simons at cryp.to> wrote:

> Hi Roman,
> 
>> With Haskell you don't have to load the whole data set into memory,
>> as Michael shows. With R, on the other hand, you do.
> 
> Can you please point me to a reference to back that claim up?
> 
> I'll offer [1] and [2] as a pretty good indications that you may not be
> entirely right about this.

It is *possible* to handle large data sets with R,
but it is *usual* to deal with things in memory.

> 
>> Besides, if you're not an R expert, and if the analysis you want to do
>> is not readily available, it may be quite a pain to implement in R.

A heck of a lot of code in R has been developed by people who think
of themselves as statisticians/financial analysts/whatever rather than
programmers or “R experts”.  There is much to dislike about R (C-like
syntax, the ‘interactive if’ trap, the clash of naming styles) but it
has to be said that R is a very good for for the data analysis problems
S was designed for, and I personally would find it *far* easier to
develop such a solution in R than Haskell.  (For other problems, of
course, it would be the other way around.)

Not only does R already have a stupefying number of packages offering
all sorts of analyses, so that it’s quite hard to find something that
you *have* to implement, there is an extremely active mailing list
with searchable archives and full of wizards keen to help.  If you
*did* have to implement something, you wouldn’t be on your own.

The specific case of ‘zipwith f (tail vec) vec’ is easy:
(1) vec[-1] is vec without its first element
    vec[-length(vec)] is vec without its last element
(2) cbind(vec[-1], vec[-length(vec)])
    is an array with 2 columns.
(3) apply(cbind(vec[-1], vec[-length(vec)]), 1, f)
    applies f to the rows of that matrix.  If f returns one
    number, the answer is a vector; if f returns a row, the
    answer is a matrix.
Example:
> vec <- c(1,2,3,4,5)
> mat <- cbind(vec[-1], vec[-length(vec)])
> apply(mat, 1, sum)
[1] 3 5 7 9
In this case, you could just do
> vec[-1] + vec[-length(vec)]
and get the same answer.

Oddly enough, one of the tricks for success in R is, like Haskell,
to learn your way around the higher-order functions in the library.