[Haskell-cafe] Haskell and Big Data

Fri Dec 20 07:24:46 UTC 2013

Cloud Haskell is a substrate that could be used to build such a layer.  I'm
sure the cloud Haskell people would love such experimenration.

On Friday, December 20, 2013, He-chien Tsai wrote:

> What I meant is that split the data into several parts,send each splited
> data to different computers, train them seperately, finally send the
> results back and combine those results. I didn't mean to use Cloud Haskell.
>
> 2013/12/20 上午5:40 於 "jean-christophe mincke" <
> jeanchristophe.mincke at gmail.com <javascript:_e({}, 'cvml',
> 'jeanchristophe.mincke at gmail.com');>> 寫道：
> >
> > He-Chien Tsai,
> >
> > >  its training result is designed for composable
> >
> > Yes it is indeed composable (parallel function of that lib) but
> parallelizing it on a cluster changes all the type because running on a
> cluster implies IO.
> > Moreover using Cloud Haskell (for instance) implies that:
> > 1. training functions should be (serializable) clojures, which can only
> be defined as module level (not as local -let/where - bindings).
> > 2. train is a typeclass function and is not serializable.
> >
> > So the idea behind HLearn are interesting but I do not see how it could
> be run on a cluster... But, unfortunately, I am not an Haskell expert.
> >
> > What do you think?
> >
> > Regards
> >
> > J-C
> >
> >
> >
> > On Thu, Dec 19, 2013 at 6:15 PM, He-chien Tsai <depot051 at gmail.com<javascript:_e({}, 'cvml', 'depot051 at gmail.com');>>
> wrote:
> >>
> >> have you took a look at hlearn and statistics packages? it's even easy
> to parallellize hlearn on cluster because it's training result is designed
> for composable, which means you can create two model , train them
> seperately and finally combine them. you can also use other database such
> as redis or cassandra,which has haskell binding, as backend. for
> parallellizing on clusters, hdph is also good.
> >>
> >> I personally prefer python for data science because it has much more
> mature packages and is more interactive and more effective (not kidding.
> you can create compiled C for core datas and algorithms by python-like
> cython and call it from python, and exploit gpus for accelerating by
> theano) than haskell and scala, spark also has a unfinish python binding.
> >>
> >> 2013/12/18 下午3:41 於 "jean-christophe mincke" <
> jeanchristophe.mincke at gmail.com <javascript:_e({}, 'cvml',
> 'jeanchristophe.mincke at gmail.com');>> 寫道：
> >>
> >>
> >> >
> >> > Hello Cafe,
> >> >
> >> > Big Data is a bit trendy these days.
> >> >
> >> > Does anybody know about plans to develop an Haskell eco-system in
> that domain?
> >> > I.e tools such as Storm or Spark (possibly on top of Cloud Haskell)
> or, at least, bindings to tools which exist in other languages.
> >> >
> >> > Thank you
> >> >
> >> > Regards
> >> >
> >> > J-C
> >> >
> >> > _______________________________________________
> >> > Haskell-Cafe mailing list
> >> > Haskell-Cafe at haskell.org <javascript:_e({}, 'cvml',
> 'Haskell-Cafe at haskell.org');>
> >> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20131220/ff651577/attachment.html>