Hackage 2 and acid-state vs traditional databases
Erik Hesselink
hesselink at gmail.com
Thu Sep 6 21:38:58 CEST 2012
Hi Ian,
We used acid-state (actually happstack-state) at Silk for our session
store. We had the same problems you describe: slow shutdown/startup,
high memory usage, unable to inspect the data. We recently switched to
an SQL database. Just another data point.
Erik
On Thu, Sep 6, 2012 at 8:49 PM, Ian Lynagh <ian at well-typed.com> wrote:
>
> Hi all,
>
> I've had a bit of experience with Hackage 2 and acid-state now, and I'm
> not convinced that it's the best fit for us:
>
> * It's slow. It takes about 5 minutes for me to stop and then start the
> server. It's actually surprising just how slow it is, so it might be
> possible/easy to get this down to seconds, but it still won't be
> instantaneous.
>
> * Memory usage is high. It's currently in the 700M-1G range, and to get
> it that low I had to stop the parsed .cabal files from being held in
> memory (which presumably has an impact on performance, although I
> don't know how significant that is), and disable the reverse
> dependencies feature. It will grow at least linearly with the number
> of package/versions in Hackage.
>
> * Only a single process can use the database at once. For example, if
> the admins want a tool that will make it easier for them to approve
> user requests, then that tool needs to be integrated into the Hackage
> server (or talk to it over HTTP), rather than being standalone.
>
> * The database is relatively opaque. While in principle tools could be
> written for browsing, modifying or querying it, currently none exist
> (as far as I know).
>
> * The above 2 points mean that, for example, there was no easy way for
> me to find out how many packages use each top-level module hierarchy
> (Data, Control, etc). This would have been a simple SQL query if the
> data had been in a traditional database, but as it was I had to write
> a Haskell program to process all the package .tar.gz's and parse the
> .cabal files manually.
>
> * acid-state forces us to use a server-process model, rather than having
> processes for individual requests run by apache. I don't know if we
> would have made this choice anyway, so this may or may not be an
> issue. But the current model does mean that adding a feature or fixing
> a bug means restarting the process, rather than just installing the
> new program in-place.
>
> Someone pointed out that one disadvantage of traditional databases is
> that they discourage you from writing as if everything was Haskell
> datastructures in memory. For example, if you have things of type
> data Foo = Foo {
> str :: String,
> bool :: Bool,
> ints :: [Int]
> }
> stored in a database then you could write either:
> foo <- getFoo 23
> print $ bool foo
> or
> b <- getFooBool 23
> print b
>
> The former is what you would more naturally write, but would require
> constructing the whole Foo from the database (including reading an
> arbitrary number of Ints). The latter is thus more efficient with the
> database backend, but emphasises that you aren't working with regular
> Haskell datastructures.
>
> This is even more notable with the Cabal types (like PackageDescription)
> as the types and various utility functions already exist - although it's
> currently somewhat moot as the current acid-state backend doesn't keep
> the Cabal datastructures in memory anyway.
>
>
> The other issue raised is performance. I'd want to see (full-size)
> benchmarks before commenting on that.
>
>
> Has anyone else got any thoughts?
>
>
>
> On a related note, I think it would be a little nicer to store blobs as
> e.g.
> 54/54fb24083b14b5916df11f1ffcd03b26/foo-1.0.tar.gz
> rather than
> 54/54fb24083b14b5916df11f1ffcd03b26
>
> I don't think that this breaks anything, so it should be noncontentious.
>
>
> Thanks
> Ian
>
>
> _______________________________________________
> cabal-devel mailing list
> cabal-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/cabal-devel
More information about the cabal-devel
mailing list