[Haskell-cafe] hmatrix-labeled: request for comments

Ian Ross ian at skybluetrades.net
Wed Feb 19 19:47:29 UTC 2014


Hi Nikita,

I've been thinking about data import issues recently as well (in this case,
NetCDF files: https://github.com/ian-ross/hnetcdf is kind of nasty and
unfinished, but it will get there in the end).  The easy access to data
files of all different kinds in R is one of its really big advantages.
 I've done tasks in the past where I've needed to read CSV files, NetCDF
files, GeoTIFF data and ESRI shape files, all for the sam job.  R handles
them all seamlessly, giving you a very open data analysis platform.  I'd
like Haskell eventually to be a similarly open platform with as easy a
workflow as R for data analysis, but it's going to take some work to get
there.

As you kind of say, one thing that's tricky is deciding on what matrix
libraries to use.  I don't have any good ideas about that.  With the NetCDF
stuff, I've been experimenting with making all of the get and put functions
polymorphic in a "store" type, allowing you to read and write
Storable.Vectors, Repa arrays and hmatrix arrays.  I'm not convinced I'm
doing it right, but the alternative is to support only a single array type,
which, until the community settles on a single canonical array type (if
that's even possible), seems restrictive.

I guess from the perspective of what you could do with your hmatrix-labeled
code, aiming for something as flexible as R's read.table (which is
specialised for read.csv and variants) that supports at least a few of the
common Haskell array types would be nice.  However, there's a danger of
duplicating some of the good work that's already been done on fast CSV
parsers (cassava, csv-conduit and pipes-csv).  Cassava, in particular, has
a nice lightweight API that's very suitable for interactive work.  If you
extended the Cassava parser to support a wider range of file formats (like
read.table) and added some helpers for converting to array types where it
makes sense, that might be enough.

Cheers,

Ian.



On 19 February 2014 20:06, Nikita Karetnikov <nikita at karetnikov.org> wrote:

> I like how easy it is to import data in R and Octave.  (See [1] for a
> typical workflow.)  Since I couldn’t find any matching library on
> Hackage, I cooked up my own [2] in a couple of days.
>
> Here’s an example.  Let’s start by creating a poorly formatted dataset:
>
> $ cat > test.txt
>                          One         Two Longish   Four
>      Foo  1       -2            3.0     4.0
> Looooooong  5.0   6.0   -72.0                 8.0
>         Baz 41.0 4.2324234e7  43.0 1.111111144E-12
>
> Then we parse it with ‘readFile’, mangle the data a bit, and display in
> GHCi:
>
> λ> import qualified Data.Packed.LMatrix.IO as L
> λ> import qualified Data.Packed.LMatrix as L
> λ> do m <- L.readFile "test.txt"; return . L.trans . L.reverseRows $ L.map
> (+1) m
> (4><3)
>                        Baz Looooooong  Foo
>     One               42.0        6.0  2.0
>     Two        4.2324235e7        7.0 -1.0
> Longish               44.0      -71.0  4.0
>    Four 1.0000000000011111        9.0  5.0
>
> Now I’m wondering how to make it better.  I’m planning to add the
> documentation, augment the parser to accept CSV, and maybe support other
> matrix libraries.  What’s missing?  Would you like to see it on Hackage?
> And if not, why?
>
> [1] http://astrostatistics.psu.edu/su09/lecturenotes/pca.html
> [2] https://gitorious.org/hmatrix-labeled
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Ian Ross   Tel: +43(0)6804451378   ian at skybluetrades.net
www.skybluetrades.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140219/2dad8fb5/attachment-0001.html>


More information about the Haskell-Cafe mailing list