[Haskell-cafe] Haskell and Big Data

jean-christophe mincke jeanchristophe.mincke at gmail.com
Sat Dec 21 17:52:11 UTC 2013


Hi,

There is the Holumbus <http://holumbus.fh-wedel.de/trac> packages. It seems
to have a DFS and support for analytics (map/reduce at least).

Regards

J-C


On Sat, Dec 21, 2013 at 2:50 PM, Alexander Kjeldaas <
alexander.kjeldaas at gmail.com> wrote:

>
> In the HPCC documentation it is hard to cut through the buzzword jungle.
> Is there an efficient storage solution lurking there?
>
> I searched for haskell packages related to the big data storage layer, and
> the only thing I've found that could support efficient erasure code-based
> storage is this 3 years old binding to libhdfs.  There is only one commit
> in github:
>
> https://github.com/kim/hdfs-haskell
>
> Somewhat related are these bindings to zfec, from 2008, and part of the
> Tahoe LAFS project.
>
> http://hackage.haskell.org/package/fec
>
>
> Alexander
>
>
>
> On Fri, Dec 20, 2013 at 8:24 AM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Cloud Haskell is a substrate that could be used to build such a layer.
>>  I'm sure the cloud Haskell people would love such experimenration.
>>
>>
>> On Friday, December 20, 2013, He-chien Tsai wrote:
>>
>>> What I meant is that split the data into several parts,send each splited
>>> data to different computers, train them seperately, finally send the
>>> results back and combine those results. I didn't mean to use Cloud Haskell.
>>>
>>> 2013/12/20 上午5:40 於 "jean-christophe mincke" <
>>> jeanchristophe.mincke at gmail.com> 寫道:
>>> >
>>> > He-Chien Tsai,
>>> >
>>> > >  its training result is designed for composable
>>> >
>>> > Yes it is indeed composable (parallel function of that lib) but
>>> parallelizing it on a cluster changes all the type because running on a
>>> cluster implies IO.
>>> > Moreover using Cloud Haskell (for instance) implies that:
>>> > 1. training functions should be (serializable) clojures, which can
>>> only be defined as module level (not as local -let/where - bindings).
>>> > 2. train is a typeclass function and is not serializable.
>>> >
>>> > So the idea behind HLearn are interesting but I do not see how it
>>> could be run on a cluster... But, unfortunately, I am not an Haskell expert.
>>> >
>>> > What do you think?
>>> >
>>> > Regards
>>> >
>>> > J-C
>>> >
>>> >
>>> >
>>> > On Thu, Dec 19, 2013 at 6:15 PM, He-chien Tsai <depot051 at gmail.com>
>>> wrote:
>>> >>
>>> >> have you took a look at hlearn and statistics packages? it's even
>>> easy to parallellize hlearn on cluster because it's training result is
>>> designed for composable, which means you can create two model , train them
>>> seperately and finally combine them. you can also use other database such
>>> as redis or cassandra,which has haskell binding, as backend. for
>>> parallellizing on clusters, hdph is also good.
>>> >>
>>> >> I personally prefer python for data science because it has much more
>>> mature packages and is more interactive and more effective (not kidding.
>>> you can create compiled C for core datas and algorithms by python-like
>>> cython and call it from python, and exploit gpus for accelerating by
>>> theano) than haskell and scala, spark also has a unfinish python binding.
>>> >>
>>> >> 2013/12/18 下午3:41 於 "jean-christophe mincke" <
>>> jeanchristophe.mincke at gmail.com> 寫道:
>>> >>
>>> >>
>>> >> >
>>> >> > Hello Cafe,
>>> >> >
>>> >> > Big Data is a bit trendy these days.
>>> >> >
>>> >> > Does anybody know about plans to develop an Haskell eco-system in
>>> that domain?
>>> >> > I.e tools such as Storm or Spark (possibly on top of Cloud Haskell)
>>> or, at least, bindings to tools which exist in other languages.
>>> >> >
>>> >> > Thank you
>>> >> >
>>> >> > Regards
>>> >> >
>>> >> > J-C
>>> >> >
>>> >> > _______________________________________________
>>> >> > Haskell-Cafe mailing list
>>> >> > Haskell-Cafe at haskell.org
>>> >> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>>> >> >
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20131221/ed7610af/attachment.html>


More information about the Haskell-Cafe mailing list