<div><div>Hello,</div><div> </div><div>I would like to announce the release of Karps, an experimental Haskell frontend to Spark Dataframes and datasets. Apache Spark [1] is a popular framework for distributed programming, which comes with different APIs. The excellent Sparkle project [2] from Tweag integrates well with Spark's low-level ("RDD") API, while Karps focuses on the more recent dataframe and dataset API only. In that sense, both projects are complementary in their goals and scope.</div><div> </div><div>What can you do with it? So far, simple queries such as number manipulation, importing lists of data, reading json files, etc. To facilitate debugging, it integrates with Google's Tensorboard [3] to provide rich visualizations of the dataflow. In addition, thanks to Haskell, it includes a full-program analyzer and optimizer that can automate common tasks such as cache management, query optimizations, etc. Some IHaskell notebooks give a flavor of what is possible, see the link in the github page:</div><div> </div><div><a href="https://github.com/krapsh/kraps-haskell">https://github.com/krapsh/kraps-haskell</a></div><div> </div><div><a href="https://hackage.haskell.org/package/karps-0.2.0.0">https://hackage.haskell.org/package/karps-0.2.0.0</a></div><div> </div><div>The main motivation of the author (a Spark developer) is that writing Spark frontends for new programming languages is very hard. Karps explores a language-agnostic API that is easy enough to build simple frontends (javascript, julia), yet allows Spark to perform rich optimizations under the hood. If you want to know more, a talk will take place at the San Francisco Spark Users meetup on this topic.</div><div> </div><div>Since this is my first Haskell project (I wrote my first line of Haskell nine months ago), I will appreciate all feedback regarding form and substance. For example, some questions still puzzle me:</div><div>- how to integrate a style checker (I use atom+ghc-mod)</div><div>- what are the best practices for integration testing?</div><div>- can I have tests that depend on internal modules, yet hide these internal module from the haddock documentation?</div><div> </div><div>Thank you for your feedback</div><div> </div><div>[1] http://spark.apache.org/</div><div>[2] https://github.com/tweag/sparkle</div><div>[3] https://www.tensorflow.org/get_started/summaries_and_tensorboard</div><div> </div></div>