[Haskell-cafe] Data analysis with Haskell
Don Stewart
dons at galois.com
Mon Jan 12 16:18:02 EST 2009
d:
> Hi all,
>
> I'm going to start a project where I'll have to do some data analysis
> (statistics about product orders) based on database entries; it will
> mostly be some very basic stuff like grouping by certain rules and
> finding averages as well as summing up and such. It will however be
> more than what can be done directly in the database using SQL, so there
> will be some processing in my program.
>
> I'm thinking about trying to do this in Haskell (because I like this
> language a lot); however, it is surely not my most proficient language
> and I tried to do some number crunching (real one that time) before in
> Haskell where I had to deal with some 4 million integer lists, and this
> failed; the program took a lot more memory than would have been
> necessary and ran for several minutes (kept swapping all the time, too).
> A rewrite in Fortran did give the result in 6s and didn't run out of
> space.
**Don't use lists when you mean to use arrays**
E.g. multiple two 4M element arrays, map over the result and sum that.
import Data.Array.Vector
main = print . sumU . mapU (+7) $ zipWithU (*)
(enumFromToU 1 (4000000 :: Int))
(enumFromToU 2 (4000001 :: Int))
Compile it:
$ ghc -O2 -fvia-C -optc-O3 -funbox-strict-fields --make
$ time ./A
2886605259654448384
./A 0.03s user 0.00s system 97% cpu 0.034 total
Not the end of the world at all.
> This was probably my fault at that time, because I surely did something
> completely wrong for the Haskell style. However, I fear I could run
> into problems like that in the new project, too. So I want to ask for
> your opinions, do you think Haskell is the right language to do data
You want to compile Haskell DB queries into SQL?
> analysis of this kind? And do you think it is hard for still beginner
> Haskell programmer to get this right so Haskell does not use up a lot of
> memory for thunks or list-overhead or things like that? And finally,
> are there database bindings for Haskell I could use for the queries?
There are lots of database bindings. Very popular ones are HDBC and
Takusen. Check on hackage.haskell.org
-- Donnn
More information about the Haskell-Cafe
mailing list