[Haskell-cafe] Data analysis with Haskell

Daniel Kraft d at domob.eu
Mon Jan 12 16:16:14 EST 2009


Hi all,

I'm going to start a project where I'll have to do some data analysis 
(statistics about product orders) based on database entries; it will 
mostly be some very basic stuff like grouping by certain rules and 
finding averages as well as summing up and such.  It will however be 
more than what can be done directly in the database using SQL, so there 
will be some processing in my program.

I'm thinking about trying to do this in Haskell (because I like this 
language a lot); however, it is surely not my most proficient language 
and I tried to do some number crunching (real one that time) before in 
Haskell where I had to deal with some 4 million integer lists, and this 
failed; the program took a lot more memory than would have been 
necessary and ran for several minutes (kept swapping all the time, too). 
  A rewrite in Fortran did give the result in 6s and didn't run out of 
space.

This was probably my fault at that time, because I surely did something 
completely wrong for the Haskell style.  However, I fear I could run 
into problems like that in the new project, too.  So I want to ask for 
your opinions, do you think Haskell is the right language to do data 
analysis of this kind?  And do you think it is hard for still beginner 
Haskell programmer to get this right so Haskell does not use up a lot of 
memory for thunks or list-overhead or things like that?  And finally, 
are there database bindings for Haskell I could use for the queries?

Thanks a lot!

Daniel



More information about the Haskell-Cafe mailing list