[Haskell-cafe] Top-down inserts in Persistent

Mon Dec 30 21:57:33 UTC 2013

On Mon, Dec 30, 2013 at 3:11 PM, Michael Orlitzky <michael at orlitzky.com> wrote:
> On 12/29/2013 12:33 PM, Adam Bergmark wrote:
> I have 650 XML documents -- all with different schemas -- to import.
> Assuming some of them are outdated or unused, I might wind up doing 100
> before I declare victory. Still an offensive amount of XML =)
>
> To parse the XML I already need to create 100 Haskell data types; that
> part is unavoidable. But since XML is XML, all of those data types are
> trees.

Are you sure a relational schema with the structure of each type of
XML document is the best approach for your dataset?  It sounds like
you could benefit from a less structured approach, since your data
doesn’t sound very regular.

> Michael Snoyman suggested,
>
>   forM_ people $ \(PersonXML name cars) -> do
>     personId <- insert $ Person name
>     forM_ cars $ \car -> insert_ $ Car personId car
>
> which works for one tree, Person { [Car] }. But it doesn't work for
> Person { [Car], [Shoes] }, or anything else. The essence of the problem
> is that I don't want to write 100 functions like the forM above that all
> do the same thing but to trees with slightly different shapes. They
> should all follow the same pattern: insert the big thing, then insert
> the little things with automatic foreign keys to the big thing.

Boris Lykah’s [Groundhog] library sounds like a good fit for your situation:

    {-# LANGUAGE FlexibleInstances, GADTs, QuasiQuotes,
TemplateHaskell, TypeFamilies #-}

    import Control.Monad.IO.Class (liftIO)
    import Database.Groundhog.Core (insert, select)
    import Database.Groundhog.Generic.Sql.Functions (like)
    import Database.Groundhog.Generic (defaultMigrationLogger,
runDbConn, runMigration)
    import Database.Groundhog.Postgresql (withPostgresqlConn)
    import Database.Groundhog.TH (defaultCodegenConfig, groundhog,
migrationFunction, mkPersist)

    data Car
      = Car { carName :: String }
      deriving Show

    data Driver
      = Driver { driverName :: String , driverCars :: [Car] }
      deriving Show

    penelope, anthills :: Driver
    penelope = Driver "Penelope Pitstop" [Car "The Compact Pussycat"]
    anthills = Driver "The Ant Hill Mob" [Car "The Bulletproof Bomb",
Car "Chugga-Boom"]

    mkPersist
      defaultCodegenConfig { migrationFunction = Just "migrateAll" }
      [groundhog|
        - entity: Car
        - entity: Driver
      |]

    main :: IO ()
    main = withPostgresqlConn "host=/tmp" $ runDbConn $ do
      runMigration defaultMigrationLogger migrateAll
      mapM_ insert [penelope, anthills]
      drivers <- select $ DriverNameField `like` "The%"
      liftIO $ mapM_ print drivers

This code will create a few tables: one for the `Driver` constructor,
another for the `Car` constructor, and a couple of tables to keep
track of what’s in the list in the `Driver` constructor.  It will even
create triggers to help maintain the list-related tables clean,
although I venture it’d be uncomfortable manipulating this specific
generated schema by hand.

Groundhog is very flexible with the sort of data types and schemas it
can work with.  That example was getting a bit long so I didn’t
include anything related to constraints, but specifying uniqueness
constraints and the like is relatively painless.

Boris wrote a very nice [tutorial] for Groundhog in FP Complete’s
School of Haskell, and the Hackage documentation for the
[`groundhog-th`] package describes the `groundhog` quasiquoter pretty
well.

[Groundhog]: <http://hackage.haskell.org/package/groundhog>
[tutorial]: <https://www.fpcomplete.com/school/to-infinity-and-beyond/competition-winners/groundhog>
[`groundhog-th`]:
<http://hackage.haskell.org/package/groundhog-th-0.4.0.3/docs/Database-Groundhog-TH.html>