[Haskell-cafe] Haskell and Big Data

He-chien Tsai depot051 at gmail.com
Fri Dec 20 07:19:41 UTC 2013


What I meant is that split the data into several parts,send each splited
data to different computers, train them seperately, finally send the
results back and combine those results. I didn't mean to use Cloud Haskell.

2013/12/20 上午5:40 於 "jean-christophe mincke" <
jeanchristophe.mincke at gmail.com> 寫道:
>
> He-Chien Tsai,
>
> >  its training result is designed for composable
>
> Yes it is indeed composable (parallel function of that lib) but
parallelizing it on a cluster changes all the type because running on a
cluster implies IO.
> Moreover using Cloud Haskell (for instance) implies that:
> 1. training functions should be (serializable) clojures, which can only
be defined as module level (not as local -let/where - bindings).
> 2. train is a typeclass function and is not serializable.
>
> So the idea behind HLearn are interesting but I do not see how it could
be run on a cluster... But, unfortunately, I am not an Haskell expert.
>
> What do you think?
>
> Regards
>
> J-C
>
>
>
> On Thu, Dec 19, 2013 at 6:15 PM, He-chien Tsai <depot051 at gmail.com> wrote:
>>
>> have you took a look at hlearn and statistics packages? it's even easy
to parallellize hlearn on cluster because it's training result is designed
for composable, which means you can create two model , train them
seperately and finally combine them. you can also use other database such
as redis or cassandra,which has haskell binding, as backend. for
parallellizing on clusters, hdph is also good.
>>
>> I personally prefer python for data science because it has much more
mature packages and is more interactive and more effective (not kidding.
you can create compiled C for core datas and algorithms by python-like
cython and call it from python, and exploit gpus for accelerating by
theano) than haskell and scala, spark also has a unfinish python binding.
>>
>> 2013/12/18 下午3:41 於 "jean-christophe mincke" <
jeanchristophe.mincke at gmail.com> 寫道:
>>
>>
>> >
>> > Hello Cafe,
>> >
>> > Big Data is a bit trendy these days.
>> >
>> > Does anybody know about plans to develop an Haskell eco-system in that
domain?
>> > I.e tools such as Storm or Spark (possibly on top of Cloud Haskell)
or, at least, bindings to tools which exist in other languages.
>> >
>> > Thank you
>> >
>> > Regards
>> >
>> > J-C
>> >
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe at haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20131220/5d4dbbb1/attachment.html>


More information about the Haskell-Cafe mailing list