[Haskell-cafe] Difficulties in accessing inner elements of data types

David Miani davidmiani at gmail.com
Tue Mar 3 05:57:57 EST 2009


Hi,
I'm working on a Haskell library for interacting with emacs org files. For 
those that do not know, an org file is a structured outline style file that 
has nested headings, text, tables and other elements. For example:

* Heading 1
Some text, more text. This is a subelement of Heading 1
1. You can also have list
   1. and nested lists
   2. more...

** Nested Heading (subelement of Heading 1)
text... (subelement of Nested Heading)

** Another level 2 heading (subelement of Heading 1)
| Desc      | Value                                |
|-----------+--------------------------------------|
| Table     | You can also have tables in the file |
| another   | row                                  |
| seperator | you can have seps as well, eg        |
|-----------+--------------------------------------|

* Another top level heading
There are many more features, see orgmode.org

My library enables read and write access to a subset of this format (eg lists 
aren't parsed atm). 

The data structures used for writing are:

data OrgFile = OrgFile [OrgFileElement]
data OrgFileElement = Table OrgTable
                     | Paragraph String
                     | Heading OrgHeading

-- heading level, title, subelements
data OrgHeading = OrgHeading Int String [OrgFileElement]

data OrgTable = OrgTable [OrgTableRow]

data OrgTableRow = OrgTableRow [String] | OrgTableRowSep 


To write a file you contruct a OrgFile out of those elements, and pass it to a 
writeOrgFile func. Eg:

writeOrg $ OrgFile [Heading (OrgHeading 1 "h1" [Paragraph "str"])]
would produce:
* h1
str

I was going to use the same data structures for reading an org file, but it 
quickly became apparent that this would not be suitable, as you needed the 
position of the file of an element to be able to report errors. Eg if you 
needed to report an error that a number was expected, the message "'cat' is 
not a number" is not very useful, but "Line 2031: 'cat' is not a number" is. 
So the data structures I used were:

data FilePosition = FilePosition Line Column

data WithPos a = WithPos {
    filePos :: FilePosition,
    innerValue :: a
    }

data OrgTableP = OrgTableP [WithPos OrgTableRow]

data OrgFileElementP = TableP OrgTableP
                     | ParagraphP String
                     | HeadingP OrgHeadingP

data OrgHeadingP = OrgHeadingP Int String [WithPos OrgFileElementP]

data OrgFileP = OrgFileP [WithPos OrgFileElementP]


Finally there is a function readOrg, which takes a string, and returns an 
OrgTableP.


Now, this all works as expected (files are correctly being parsed and 
written), however I am having a lot of trouble trying to come up with a decent 
API to work with this. While writing an OrgFile is fairly easy, reading (and 
accessing inner parts) of an org file is very tedious, and modifying them is 
horrendous.

For example, to read the description line for the project named "Project14" in 
the file:

* 2007 Projects
** Project 1
Description: 1
Tags: None
** Project 2
Tags: asdf,fdsa
Description: hello
* 2008 Projects
* 2009 Projects
** Project14
Tags: RightProject
Description: we want this

requires the code:

type ErrorS = String
listToEither str [] = Left str
listToEither _ (x:_) = Right x

get14 :: OrgFileP -> Either ErrorS String
get14 (OrgFileP elements) = getDesc =<< (getRightProject . concatProjects) 
elements where
    concatProjects :: [WithPos OrgFileElementP] -> [OrgHeadingP]
    concatProjects [] = []
    concatProjects ((WithPos _ (HeadingP h)) : rest) = h : concatProjects rest
    concatProjects (_ : rest) = concatProjects rest

    getRightProject :: [OrgHeadingP] -> Either ErrorS OrgHeadingP
    getRightProject  = listToEither "Couldn't find project14" .
                       filter (\(OrgHeadingP _ name _) -> name == "Project14")

    getDesc :: OrgHeadingP -> Either ErrorS String
    getDesc (OrgHeadingP _ _ children) =
        case filter paragraphWithDesc (map innerValue children) of
          [] -> Left $ show (filePos $ head children) ++ 
                       ": Couldn't find desc in project"
          ((ParagraphP str):_) -> Right str
          _ -> error "should not be possible"

    paragraphWithDesc :: OrgFileElementP -> Bool
    paragraphWithDesc (ParagraphP str) = str =~ "Description"
    paragraphWithDesc _ = False


If you think that is bad, try writing a function that adds the Tag "Hard" to 
Project2 :(

What I really need is a DSL that would allow sql like queries on an OrgFileP. 
For example:
select (anyHeading `next`
        headingWithName "Project14" `withFailMsg` "couldn't find p14" `next`
        paragraphMatchingRegex "Description" `withFailMsg` "no desc")
       org `output` paragraphText

would return a String
OR
select (anyHeading `next` headingWithName "Project2" `next`
        paragraphMatchingRegex "Tag:") org `modify` paragraphText (++ ",Hard")

would return an OrgFile, with the new Hard tag added.

However, I don't know if this is even possible, how to do it, or if there is a 
better alternative to this. I would really apreciate any hints with regards to 
this. It would be useful to know if there are other libraries that also face 
this problem, and how they solved it.

Finally, I would be grateful for any other advice regarding my code. One thing 
that has bugged me is my solution for having file position info - my solution 
never seemed very elegant.

Thanks,
David










-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090303/3115fb94/attachment.htm


More information about the Haskell-Cafe mailing list