[Haskell-cafe] Please review my Xapian foreign function interface

Oliver Charles haskell-cafe at ocharles.org.uk
Fri Feb 18 19:09:46 CET 2011


Hello!

I've finally came up with some motivation for a project to get my feet
wet using Haskell, and for this little pet project I need an interface
to Xapian. After reading various documents on FFI in general, I've got a
brief working implementation, and I'm now looking for how to better
structure the public API. First, a quick bit of background if you're not
familiar with Xapian.

Xapian is a search engine, and provides a C++ API. You store documents
in a database (handled by Xapian), and index documents by adding terms
to them. Xapian provides stemming algorithms to help generate these
terms from other data. Xapian also has an interface to queries (through
a Xapian::Enquire object), and also a query parser to allow for natural
language queries to be parsed and ran. For more information, you can
check out the API at [1] - it's fairly small.

As Xapian is C++, it seems my best option is to create my own simple C
wrapper, which also lets me tailor my FFI to be easy to use from
Haskell. You can see my C api on Github [2] - for now it's very stripped
down; I've been wrapping stuff on a need-to-use basis.

* * *

Currently what I have is functional (in the sense that it works), but
it's extremely tied to I/O and very little of the code is pure. For
example, to create and index a document, you need to do something along
the lines of:

    do document <- newDocument
       setDocumentData document "Document data"
       addPosting document "search_term" 1
       addDocument database document

(Assuming you already have an open database handle). How horrible
imperative this all looks! :-) A document *feels* like it should be
quite pure, however retrieving properties of a document performs
I/O. For example, I'd like to have something like:

    data Document = Document { data :: String, postings :: [String] }
    do document <- getDocument database 123 -- Get doc #123

and have `document` refer to a pure Document object. I'm still stuck in
the IO monad a bit, but at least I can write pure functions to operate
on `Document` values now. The problem I see with this, is that I believe
I'd have to retrieve all parts of document in my `getDocument` function
(include the data and all postings), and I can't benefit from being lazy
here.



More information about the Haskell-Cafe mailing list