[Haskell-cafe] [ANN] sparkle: native Apache Spark applications in Haskell

Alp Mestanogullari alpmestan at gmail.com
Sun Feb 28 10:14:23 UTC 2016

Hello Lyndon,

Glad to hear this is of interest to you. Let us know if you have any kind
of feedback -- just keep in mind we only cover a ridiculous fraction of the
Spark API at the moment, but this can easily be expanded. The
implementation of the Spark classes/methods that we have can be a guide for
implementing ones that are not there yet.

Regarding data frames, well, as a haskeller, Spark's data frame impl feels
a bit unsafe to me as the type (which is just 'DataFrame') doesn't indicate
how many columns there are or what type the values stored in those columns
have. But Spark provides a bunch of algorithms that use those data frames
so if you happen to need one of those algorithms, you can quickly expose it
to Haskell and then wrap it all in a type-safe and haskell-y way once
you've made sure everything works. This all means that, at the moment,
sparkle doesn't do anything smart there. If you have any idea/suggestion,
we're all ears though!

On Fri, Feb 26, 2016 at 12:56 AM, Lyndon Maydwell <maydwell at gmail.com>

> Hi Alp,
> Just wanted to say that there's interest here in Melbourne in
> Spark+Haskell too and we'll definitely be trying this out to see what it's
> like.
> One of the problems that some of the more exotic language-bindings to
> spark have is that while they include RDD support, they lack a
> language-idiomatic interpretation of DataFrames. Does Sparkle attempt to
> tackle this?
> Many thanks to Tweag I/O for doing this. It must have been a lot of work!
> Regards,
>  - Lyndon
> On Fri, Feb 26, 2016 at 4:50 AM, Alp Mestanogullari <alpmestan at gmail.com>
> wrote:
>> Hello -cafe!
>> Recently at Tweag I/O we've been working on sparkle, a library for
>> writing (distributed) Apache Spark applications directly in Haskell!
>> We have published a blog post introducing the project (and some of its
>> challenges) here:
>> http://www.tweag.io/blog/haskell-meets-large-scale-distributed-analytics
>> The corresponding repository lives at https://github.com/tweag/sparkle
>> While this is still early stage work, we can already write non-trivial
>> Spark applications in Haskell and have them run accross an entire cluster.
>> We obviously do not cover the whole Spark API yet (very, very far from
>> that) but would be glad to already get some feedback.
>> Cheers
>> --
>> Alp Mestanogullari
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Alp Mestanogullari
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160228/acc5e20b/attachment.html>

More information about the Haskell-Cafe mailing list