[Haskell-cafe] Distributing Haskell on a cluster

Ozgun Ataman ozgun.ataman at soostone.com
Mon Mar 16 00:31:31 UTC 2015


Anecdotal support for this idea: This is exactly how we distribute hadron[1]-based Hadoop MapReduce programs to cluster nodes at work. The compiled executable essentially ships itself to the nodes and recognizes the different environment when executed in that context. 

[1] hadron is a haskell hadoop streaming framework that came out of our work. It's on github and close to being released on hackage once the current dev branch is finalized/merged. In case it's helpful:  https://github.com/soostone/hadron

Oz

> On Mar 15, 2015, at 8:06 PM, Andrew Cowie <andrew at operationaldynamics.com> wrote:
> 
> Bit of a whinger from left-field, but rather than deploying a Main script and then using GHCi, have you considered compiling the program and shipping that?
> 
> Before you veto the idea out of hand, statically compiled binaries are good for being almost self-contained, and (depending on what you changed) and they rsync well. And if that doesn't appeal, then consider instead building the Haskell program dynamically; Hello World is only a couple kB; serious program only a hundred or so.
> 
> Anyway, I know you're just looking to send a code fragment closure, but if you're dealing with the input and output of the program through a stable interface, then the program is the closure.
> 
> Just a thought.
> 
> AfC
> 
>> On Mon, Mar 16, 2015 at 9:53 AM felipe zapata <tifonzafel at gmail.com> wrote:
>> Hi all,
>> I have posted the following question on stackoverflow, but so far I have not received an answer.
>> http://stackoverflow.com/questions/29039815/distributing-haskell-on-a-cluster
>> 
>> 
>> I have a piece of code that process files,
>> 
>> processFiles ::  [FilePath] -> (FilePath -> IO ()) -> IO ()
>> This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
>> 
>> Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
>> 
>> Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
>> 
>> Best,
>> 
>> Felipe
>> 
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20150315/a2a3bb63/attachment.html>


More information about the Haskell-Cafe mailing list