[Haskell-cafe] Persistent Concurrent Data Structures

Tue Nov 1 23:31:09 CET 2011

Hi,
Please comment on the idea and advise on steps to implement it.
Real world applications need persistent data, that can be accessed and
modified concurrently by several clients, in a way that preserves
"happen-before" relationship.
Idea: Design and implement Persistent Concurrent Data Types in Haskell.
These data types should mirror existing Data.List , Data.Map and similar
types but provide persistency and support consistent concurrent access and
modification (or simply - "concurrency").
Persistency and concurrency should be configurable through these type
interfaces. Configuration should include:
1) Media to persist data, such as file, DBMS, external key-value store (for
example Amazon SimpleDB, CouchDB, MongoDB, Redis, etc)
2) Caching policy - when (on what events) and how much data to read/write
from/to persistent media. Media reads / writes can be done asynchronously
in separate threads.
3) Concurrency configuration: optimistic or pessimistic data locking.

One may ask why encapsulate persistency and concurrency in the data type
instead of using "native" storage API, such as for example key-value /
row-column API that  NoSQL databases provide?
The answer is simple: APIs that your code use greatly influence the code
itself. Using low-level storage  API directly in your code results in
bloated obscure code, or you need to encapsulate this low-level API in
clear and powerful abstractions. So why not to do this encapsulation once
and for all for such powerful types as Data.Map, for example, and forget
all Cassandra and SimpleDB low-level access method details?
When the right time comes and you will need to move your application to the
next new "shiny_super_cloud", you will just write the implementation of
NData.Map backed by Data.Map in terms of low-level API of this super-cloud.

(Side note: I really need such a NData.Map type. I was requested to move my
code that heavily uses Data.Map and simple text file persistence into
Amazon AWS cloud. Looking at SimpleDB API, I realized that I will have to
rewrite 90% of code. This rewrite will greatly bloat my code and will make
it very unreadable. In case I had NData.Map I would just switch
implementation from 'file' to SimpleDB persistency inside my NData.Map
type.)

Implementation:
To start playing with this idea, NData.Map persisted in a regular file will
do, no concurrency yet. Next step -   NData.Map persisted in SimpleDB or
Cassandra or Redis, with concurrent access supported.

So it looks like  NData.Map should be a monad ...
Any ideas on implementation and similar work?

Thanks!
Dmitri
---
http://sites.google.com/site/dokondr/welcome
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20111102/69f4a732/attachment-0001.htm>