[Haskell-cafe] Amazon AWS storage best to use with Haskell?
dokondr at gmail.com
Wed Nov 16 22:24:21 CET 2011
Steve, thanks for sharing your experience with AWS!
At the moment I have evaluated several NoSQL storage solutions including
SimpleDB, Riak, MongoDB and Cassandra. Lessons learned:
1) Storage that SimpleDB provides is too low-level and not very convenient
to store dictionaries and other b-tree data structures that my app. works
2) "simpledb/dev" simulator is out of date and does not support the
complete feature set of SimpleDB today. Thus, without major rewrite
"simpledb/dev" emulator can not be used for the development.
3) SimpleDB storage is 100% specific to Amazon framework. From this follows
that developing directly to SimpleDB interface will make app not portable
across different cloud platforms.
4) Cassandra row/column abstraction is awkward for Data.Map structures that
my app needs.
5) Riak provides convenient bucket/key/value abstraction and works in
robust to failure node framework. REST/JSON protocol is simple to use, yet
it is inefficient for data exchanges used by my app. I couldn't find simple
libraries for binary exchange that Riak also supports.
6) MongoDB answers my requirements best of all - it is powerful on a server
protocol based on BSON data exchange.
I also plan to use RabitMQ for communication between several Haskell
processes and Java Web front-end that my app incorporates.
It would be great to know what tools people use in the cloud (AWS, etc.) to
communicate Web front-end with rest of the (Haskell) system ?
What Haskell tools to build Web front-end?
On Wed, Nov 16, 2011 at 9:01 PM, Steve Severance <steve at medwizard.net>wrote:
> We use AWS extensively. We use the aws package and have contributed to it,
> specifically SQS functionality. I will give you the rundown of what we do.
> We moved off of SimpleDb and now use mondodb. The reason is that simple db
> seemed to have problems with write pressure and there are not good tools
> for profiling your queries. My main application is extremely write heavy
> with a single instance needing to do 100s or 1000s of writes a second.
> Mongodb has worked well for us. I am scared of things like cassandra having
> looked at the code, however some people have made it work.
> We store data such as crawled web pages in S3. The files are lzma
> compressed and the data format is built on protocol buffers. We picked lzma
> for both storage costs of cold data and the fact that the pipe between S3
> and EC2 is somewhat limited and we want to make the most effective use of
> it as possible.
> In my experience AWS simulators are more trouble than they are worth since
> they don't accurately model the way AWS will respond to you under load. The
> free tier at AWS should allow you to experiment with building an app. The
> first couple of months of development cost us less than $1.
> On Tue, Nov 1, 2011 at 1:27 AM, dokondr <dokondr at gmail.com> wrote:
>> On Tue, Nov 1, 2011 at 10:53 AM, Neil Davies <
>> semanticphilosopher at gmail.com> wrote:
>>> Word of caution
>>> Understand the semantics (and cost profile) of the AWS services first -
>>> you can't just open a HTTP connection and dribble data out over several
>>> days and hope for things to work. It is not a system that has that sort of
>>> laziness at its heart.
>>> AWS doesn't supply a traditional remote file store semantics - is
>>> queuing, simple database and object store have all been designed for large
>>> scale systems being offered as a service to a (potentially hostile) large
>>> set of users - you can see that in the way that things are designed. There
>>> are all sorts of (sensible from their point of view) performance related
>>> limits and retries.
>>> The challenge in designing nice clean layers on top of AWS is how/when
>>> to hide the transient/load related failures.
>> As a straw-man approach I would go first to NData.Map backed by Data.Map
>> with addition of "flush" function to write Data.Map to external key-value
>> store / NoSQL DB.
>> Another requirement for NData.Map is concurrent consistency, so different
>> clients could modify its state preserving "happen-before" relationship. For
>> this I would add to NData.Map a "reftresh" function, that updates local
>> copy from external key-value store.
>> As for hSimpleDB package, it looks like it doesn't build on ghc7:
>>> The hSimpleDB package
>>> Interface to Amazon's SimpleDB service.
>>> PropertiesVersions0.1 <http://hackage.haskell.org/package/hSimpleDB-0.1>,
>>> 0.2 <http://hackage.haskell.org/package/hSimpleDB-0.2>, *0.3*
>>> Dependenciesbase <http://hackage.haskell.org/package/base-184.108.40.206> (≥3
>>> & ≤4), bytestring<http://hackage.haskell.org/package/bytestring-0.9.2.0>,
>>> Crypto <http://hackage.haskell.org/package/Crypto-4.2.4>, dataenc<http://hackage.haskell.org/package/dataenc-0.14.0.2>,
>>> HTTP <http://hackage.haskell.org/package/HTTP-4000.1.2>, hxt<http://hackage.haskell.org/package/hxt-9.1.4>,
>>> network <http://hackage.haskell.org/package/network-220.127.116.11>, old-locale<http://hackage.haskell.org/package/old-locale-18.104.22.168>,
>>> old-time <http://hackage.haskell.org/package/old-time-22.214.171.124>,
>>> utf8-string <http://hackage.haskell.org/package/utf8-string-0.3.7>
>>> LicenseBSD3AuthorDavid Himmelstrup 2009, Greg Heartsfield 2007Maintainer David
>>> Himmelstrup <lemmih at gmail.com>CategoryDatabase<http://hackage.haskell.org/packages/archive/pkg-list.html#cat:database>,
>>> Web <http://hackage.haskell.org/packages/archive/pkg-list.html#cat:web>,
>>> Network<http://hackage.haskell.org/packages/archive/pkg-list.html#cat:network> Upload
>>> dateThu Sep 17 17:09:26 UTC 2009Uploaded byDavidHimmelstrupBuilt on ghc-6.10,
>>> ghc-6.12Build failureghc-7.0 (log<http://hackage.haskell.org/packages/archive/hSimpleDB/0.3/logs/failure/ghc-7.0>
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe