From nikitatchayka at gmail.com Tue Mar 7 23:23:56 2017 From: nikitatchayka at gmail.com (Nikita Tchayka) Date: Tue, 07 Mar 2017 23:23:56 +0000 Subject: [Data-haskell] Exploring the Titanic dataset with Haskell Message-ID: Hello dataHaskell! I'd like to inaugurate our mailing list by showing something that I'm working on for our documentation site. After checking some notebooks in Kaggle, this notebook by Megan Risdal really caught my eye. It looks really complete and also, accomplishes a lot of tasks ranging from easy ones as simple feature engineering by extracting titles from the names, to more complex ones like applying a Random Forest. It also includes visualization, which is cool too. I'm currently working on porting it to Haskell, so we can see what's missing and what's there. I'm using Eric Conlon's (@ejconlon) Analyze library, and even though its still young, I absolutely love it. It allows easy CSV loading and there are no name clashes like in Frames, nor you have to define your datatypes first as in Cassava. The notebook itself can be found here , it's in a repo in my Github and it can be loaded in HaskellDO, although right now I'm using Vim + Stack REPL, until error highlighting is implemented. Cheers -- nikita tchayka . software craftsman { nickseagull.github.io } -------------- next part -------------- An HTML attachment was scrubbed... URL: