[Haskell-beginners] using record in aeson

David McBride toad3k at gmail.com
Wed Oct 19 09:32:32 CEST 2011


There are a few things I can offer.

1.  Those overlap warnings are because toSubject takes a Map.  Your
first function deals with all cases of that map.  The second
declaration of toSubject servers no purpose, so just get rid of it.

2.  When you see the string "<- return $" you can replace those with
let statements, ie:

    p <- return $ M.assocs o'
    k <- return $ fst $ head p
    v <- return $ snd $ head p

replaced with
    let p = head $ M.assocs o'
         (k,v) = (fst p, snd p)

I believe using return when you don't need it could cause overhead
depending on what the monad does to a value when it returns it
(usually nothing, but still).

3. You use head quite a bit in a few situations, and it is dangerous.
If you try to head an empty list, you crash.  If those maps were empty
(maybe not possible in json), toList would give an empty list, so try
this:

       toSubject :: (M.Map T.Text Value) -> J.Parser (T.Text,
[Property])
       toSubject o' | M.null o' = mzero
                            | otherwise = do
        s <- return $ M.assocs o'
        k <- return $ fst $ head s
        v <- return $ snd $ head s

That way, you are guaranteed to have at least one element.  Better
safe than sorry.

4. In your instance for RDFType, you compare v to a a text string, and
then return s from inside the constructor.  You could have just gotten
rid of the v@ entirely and just used s, and your code actually speeds
up due to that.  The reason why is when you compare a string to v, ghc
looks to see if a type of Value can be made into a string and then
compared.  It can, but the problem is what it does is it takes its
internal bytestring, runs Data.ByteString.pack on it so that it is is
the same type as the string you compared it to, then does the
comparison.  The problem is that Data.ByteString.pack is super ultra
slow.  If you just did direct comparisons with s, you are comparing
Data.Text to Data.Text instead of String to String, which requires no
conversion of one of the arguments, and Data.Text probably has super
fast equality testing as well.

So, replace it with this:

instance FromJSON RDFType where
 parseJSON (String s)
    | s == "literal" = return $ Literal s
    | s == "bnode" = return $ BNode s
    | otherwise = return $ URI s

And I ended up just rewriting the code because I hadn't used aeson
before, and I really should learn it.  Here is what I ended up with:

{-# LANGUAGE OverloadedStrings, FlexibleInstances #-}
module Main where

import Control.Monad (mzero)

import qualified Data.ByteString as B (readFile)
import qualified Data.Map as M (toList)
import qualified Data.Text as T (Text)
import qualified Data.Vector as V (toList)

import Data.Aeson
import qualified Data.Aeson.Types as J (Parser)
import Data.Attoparsec (parse, Result(..))

type RDFValue = T.Text
type Lang = T.Text
type DTyp = T.Text

data RDFObject = Literal RDFValue Lang (Maybe DTyp) | URI RDFValue |
BNode RDFValue  deriving (Show)
data Property  = Property T.Text [RDFObject] deriving (Show)
data Subject   = Subject T.Text [Property] deriving (Show)
newtype RDF    = RDF [Subject] deriving Show

instance FromJSON RDFObject where
  parseJSON (Object o) = do
    typ <- o .: "type"
    val <- o .: "value"
    lang <- o .:? "lang"
    dtype <- o .:? "datatype"

    return $ case (typ :: T.Text) of
      "literal" -> Literal val (maybe "en" id lang) dtype
      "bnode"   -> BNode val
      _         -> URI val
  parseJSON _ = mzero


instance FromJSON [Property] where
  parseJSON (Object o) = mapM getRdfObject (M.toList o)
    where
      getRdfObject :: (T.Text, Value) -> J.Parser Property
      getRdfObject (propname, rdfobjs) = do
        rdfobjs' <- parseJSON rdfobjs -- :: J.Parser [RDFObject]
        return $ Property propname rdfobjs'
  parseJSON _ = mzero


instance FromJSON RDF where
  parseJSON (Object o) = fmap RDF $ mapM getSubject (M.toList o)
    where
      getSubject :: (T.Text, Value) -> J.Parser Subject
      getSubject (subjname, props) = do
        props' <- parseJSON props -- :: J.Parser [Property]
        return $ Subject subjname props'
  parseJSON _ = mzero

main :: IO ()
main = do
  str <- B.readFile "6.json"
  case parse (fmap fromJSON json) str of
    Done _ (Success rdf) -> print (rdf :: RDF)
    _ -> error "failed parse"


I had to rethink some things.  I removed the tuples from your data
types.  They weren't really serving a purpose, but it is a level of
indirection from your types, so it does slow things down a little.

lang and datatype tags were optional, so the parser has to use .:?
instead or else when they are missing it will stop parsing.  I moved
options that are only applicable to literals into the Literal type,
which is the functional way to do things.  It also defaults to english
if you get a literal with no lang tag.

RDF is a newtype because it only has one field, and newtypes are
faster than data, so if you aren't using the extra fields, might as
well use newtype.

After looking at the spec, I decided to modify your types a bit.  They
should be self explanatory, but if you have any questions feel free to
ask.  I think there is more to add to be officially correct, but this
does parse the example in that page you mentioned with the anna wilder
stuff in it.

One thing that I changed is that modeling the individual types doesn't
help you model this data well.  IE, you don't really need a Property
from an Object, what you really need a list of Properties from an
Object.  So the instances relfect that.  I had to add
FlexibleInstances in order to do it with lists of objects.  You should
be aware that it implicitly uses the "instance FromJSON (FromJSON a)
=> [a]" from aeson to get an array of RDFObjects in the [Propery]
instance.  Keep in mind that this is sort of in the code without
actually having to actually be there, and that's how it deals with the
json arrays in that example json code, despite having no code that
actually deals with arrays.  This is the code that you could slip in
and it would do exactly the same thing.

instance FromJSON [RDFObject] where
  parseJSON (Array a) = mapM parseJSON (V.toList a)
  parseJSON _ = mzero

So yeah, that was fun.


On Tue, Oct 18, 2011 at 8:23 PM, Rick Murphy <rick at rickmurphy.org> wrote:
> Thanks, David. That was great advice.
>
> I wonder whether you and others might provide more advice on how to
> improve the worked example below. I suspect the compiler provides a hint
> with the pattern match overlap warning, but I wonder what opportunities
> for refactoring an experienced Haskeller would envision.
>
> In terms of background, the worked example partially implements the
> following specification.
>
> http://docs.api.talis.com/platform-api/output-types/rdf-json
>
> Here's a test string and the worked example:
>
> {"http://www.example.com/about":{"http://purl.org/dc/elements/1.1/title":{"type":"literal","value":"Rick's Home Page","lang":"http://w3.org/en","datatype":"http://w3.org/#string"},"http://purl.org/dc/elements/1.1/title":{"type":"literal","value":"Rick's Home Page","lang":"http://w3.org/en","datatype":"http://w3.org/#string"}}}
>
> {-# LANGUAGE OverloadedStrings #-}
>
> module Main where
>
> import Control.Applicative
> import Control.Monad (mzero)
>
> import qualified Data.ByteString as B
> import qualified Data.Map as M
> import qualified Data.Text as T
>
> import Data.Aeson
> import qualified Data.Aeson.Types as J
> import Data.Attoparsec
>
> data RDFType = Literal T.Text | URI T.Text | BNode T.Text deriving
> (Show)
> type RDFValue = T.Text
> type Lang = T.Text
> type DTyp = T.Text
>
> data RDFObject = RDFObject {s :: RDFType, u :: RDFValue, v :: Lang, w ::
> DTyp} deriving (Show)
> data Property = Property (T.Text, RDFObject) deriving (Show)
> data Subject = Subject (T.Text, [Property]) deriving (Show)
> data RDF       = RDF [Subject] deriving Show
>
> instance FromJSON Subject
>  where
>    parseJSON (Object o) = Subject <$> toSubject o
>       where
>        toSubject :: (M.Map T.Text Value) -> J.Parser (T.Text,
> [Property])
>        toSubject o' = do
>
>         s <- return $ M.assocs o'
>         k <- return $ fst $ head s
>         v <- return $ snd $ head s
>
>         ps <- mapM (parseJSON :: Value -> J.Parser Property) (M.elems
> o')
>
>         return $ (k,ps)
>
>        toSubject _              = error "unexpected subject"
>
>    parseJSON _          = mzero
>
> instance FromJSON RDFObject where
>  parseJSON (Object o) = RDFObject <$> (o .: "type") <*> (o .: "value")
> <*> (o .: "lang") <*> (o .: "datatype")
>
> instance FromJSON Property where
>  parseJSON (Object o) = Property <$> toProperty o
>   where
>    toProperty :: (M.Map T.Text Value) -> J.Parser (T.Text, RDFObject)
>    toProperty o' = do
>
>     p <- return $ M.assocs o'
>     k <- return $ fst $ head p
>     v <- return $ snd $ head p
>
>     o'' <- parseJSON v :: J.Parser RDFObject
>
>     return (k,o'')
>    toProperty _        = error "unexpected property"
>
>  parseJSON _ = mzero
>
> instance FromJSON RDFType where
>  parseJSON v@(String s) | v == "literal" = return $ Literal s
>                        | v == "bnode" = return $ BNode s
>                        | otherwise = return $ URI s
>
> parseAll :: B.ByteString -> [Subject]
> parseAll s = case (parse (fromJSON <$> json) s) of
>  Done _ (Error err)  -> error err
>  Done ss (Success e) -> e:(parseAll ss)
>  _                   -> []
>
> main :: IO ()
> main = do s <- B.readFile "6.json"
>          let p = RDF $ parseAll s
>          print p
>
> --
> Rick
>
> On Wed, 2011-10-12 at 11:21 -0400, David McBride wrote:
>> The problem is that in parseObject, from the moment you type 'return',
>> you are then in pure code.  But you are trying to do applicative
>> functions as if you are still in the Parser monad.  Here is a way to
>> rewrite this.
>>
>> First rewrite
>> data MyRecord = MyRecord {s :: T.Text, u :: T.Text} deriving (Show)
>> because we are using Text not String, then
>>
>> parseObject o' = mapM toMyPair (M.assocs o')
>>   where
>>     toMyPair :: (T.Text, Value) -> J.Parser MyPair
>>     toMyPair (t, Object o'') = do
>>       rec <- MyRecord <$> (o'' .: "type") <*> (o'' .: "value") ::
>> J.Parser MyRecord
>>       return $ R (t, rec)
>>     toMyPair _              = error "unexpected"
>>
>> That is, stay in the parser monad and pull out the things you need
>> using do notation, then return the whole thing back into the parser
>> monad.  You could have also gone:
>>
>>     toMyPair (t, Object o'') = do
>>       typ <- o'' .: "type"
>>       val <- o'' .: "value"
>>       return $ R (t, MyRecord typ val)
>>
>>
>> On Tue, Oct 11, 2011 at 9:17 PM, Rick Murphy <rick at rickmurphy.org> wrote:
>> > Hi All:
>> >
>> > I've been elaborating on aeson examples and wondered whether someone
>> > could clarify the syntax for using a record in a pair. My goal is to
>> > substitute a record for the list of pairs created through the data
>> > constructor O [(T.Text, Value)] in MyPair below. Reason being to embed
>> > the semantics of the json file into the record. To reproduce, just
>> > uncomment the lines in the source below.
>> >
>> > The json file structure is as follows:
>> > {"outer":{"type":"literal","value":"rick"}}
>> >
>> > Note my naive attempt in the commented lines returns the following
>> > message from ghci. 'f0 b0' doesn't give me much to go on.
>> >
>> > -- E1.hs:35:41:
>> > --     Couldn't match expected type `MyRecord' with actual type `f0 b0'
>> > --     In the expression: MyRecord <$> o'' .: "type" <*> o'' .: "value"
>> > --     In the first argument of `R', namely
>> > --       `(t, MyRecord <$> o'' .: "type" <*> o'' .: "value")'
>> > --     In the expression: R (t, MyRecord <$> o'' .: "type" <*> o'' .:
>> > "value")
>> > -- Failed, modules loaded: none.
>> >
>> > {-# LANGUAGE OverloadedStrings #-}
>> >
>> > module Main where
>> >
>> > import Control.Applicative
>> > import Control.Monad (mzero)
>> >
>> > import qualified Data.ByteString as B
>> > import qualified Data.Map as M
>> > import qualified Data.Text as T
>> >
>> > import Data.Aeson
>> > import qualified Data.Aeson.Types as J
>> > import Data.Attoparsec
>> >
>> > -- data MyRecord = MyRecord {s :: String, u :: String} deriving (Show)
>> >
>> > data MyPair = O (T.Text, [(T.Text, Value)])
>> >           -- | R (T.Text, MyRecord)
>> >              deriving (Show)
>> >
>> > data ExifObject = ExifObject [MyPair]
>> >                deriving Show
>> >
>> > data Exif       = Exif [ExifObject]
>> >                deriving Show
>> >
>> > instance FromJSON ExifObject
>> >  where
>> >    parseJSON (Object o) = ExifObject <$> parseObject o
>> >      where
>> >        parseObject o' = return $ map toMyPair (M.assocs o')
>> >
>> >        toMyPair (t, Object o'')= O (t, M.assocs o'')
>> > --      toMyPair (t, Object o'')= R (t, MyRecord <$> o'' .: "type" <*>
>> > o'' .: "value")
>> >        toMyPair _              = error "unexpected"
>> >
>> >    parseJSON _          = mzero
>> >
>> > parseAll :: B.ByteString -> [ExifObject]
>> > parseAll s = case (parse (fromJSON <$> json) s) of
>> >  Done _ (Error err)  -> error err
>> >  Done ss (Success e) -> e:(parseAll ss)
>> >  _                   -> []
>> >
>> > main :: IO ()
>> > main = do s <- B.readFile "e1.json"
>> >          let p = Exif $ parseAll s
>> >          print p
>> >
>> > --
>> > Rick
>> >
>> >
>> > _______________________________________________
>> > Beginners mailing list
>> > Beginners at haskell.org
>> > http://www.haskell.org/mailman/listinfo/beginners
>> >
>>
>
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>



More information about the Beginners mailing list