[Haskell-cafe] A Query Language for Haskell Terms

Sat Jun 23 07:37:28 EDT 2007

Hi,

I am writing a Web application using HAppS.

As all HAppS apps, it represents its internal state as a Haskell term (HAppS 
automagically provides persistence and transactions).

It is a neat and efficient solution, you can write your data model entirely in 
Haskell and, at least for read-only transactions (queries) it will be operate 
as fast as possible as all data is in memory (if your transactions modify the 
application state, the transactions has to be recorded on disk to make it 
persistent, but this is pretty fast too).

One major component, however seem to be missing, if we are effectively using 
Haskell as an in-memory database where is the "SQL for Haskell": a generic 
query language for Haskell terms?

There are three basic functions that every web app has to provide, and all of 
them could be provided by a generic "Haskell SQL":
-- query the application state
-- transform (possibly monadically) the application state : the result of the 
query is the new state
-- access control:  what an user can see is what is returned by an internal 
access control query

The availabilty of such a language would be a major boost for Haskell-based 
web applications as every application could be accessed via the same API, the 
only difference being the underlying application-specific data model.

So my question is: what ready-made solutions are there in this space, if any?

And if there are none, how would you proceed to design/implement such a 
language?

The basic requirements, in decreasing order of importance, are:

-- Safe, it must be possible to guarantee that a query:
--- cannot cause a system crash
--- completes by a fixed time of time
--- uses a 'reasonable' amount of space
--- cannot perform any unsafe operation (IO, or any unallowed read/write of 
the application state)

-- Expressive (simple queries should be simple, complex queries should be 
possible)

-- Simple to implement 

-- Efficient:
--- Repeated queries should be executed efficiently time-wise (it is 
acceptable for queries to be executed inefficiently the first time) and all 
should be space-efficient, so it should not do unnecessary copying. 

-- User friendly:
--- Simple to use for non-haskeller
--- Short queries

Ah, I almost forgot, it should also be able to make a good espresso.

The problem can be broken in two parts:

1) How to implement generic queries on nested terms in Haskell?

2) How to map the queries, written as a string, to the internal Haskell query

Regarding the first point, I am aware of with the following options:
- SYB (Data.Generics..)
- Oleg's Zipper 
- (Nested) list comprehensions (that are being extended with SQL-like order by 
and group by operators)

Being rather new to Haskell all these options are rather unfamiliar so I would 
appreciate any advice on what should be preferred and why.

Regarding the second point: 

The simplest solution would be to avoid the problem entirely by using Haskell 
directly as the query language.

This is the LambdaBot way: queries are Haskell 
expression, compiled in a limited environment (a module with a fixed set of 
imports, no TH).

Lambdabot avoids problems by executing the expression on a separate process in 
a OS-enforced sandbox that can be as restrictive as required (especially 
using something like SELinux).

However, to get the query to execute efficiently it would probably have to be 
executed in a GHC thread and I am not sure how safe that would be.

Looking at the discussion at  
http://haskell.org/haskellwiki/Safely_running_untrusted_Haskell_code 
 it seems clear that there are many open issues.

For example, how would I enforce limits on the space used by the query?

So, it would probably be better to define a separate query 
language that is  less expressive but more controllable than full Haskell, but 
what form should that take?

Any suggestion/tip/reference is very welcome,

     titto