Hackage 2 status
duncan.coutts at googlemail.com
Tue Jul 3 22:05:31 CEST 2012
On 3 July 2012 20:38, Johan Tibell <johan.tibell at gmail.com> wrote:
> On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts
> <duncan.coutts at googlemail.com> wrote:
>> Something to keep in mind is memory usage. I know Jeremy is looking at
>> this from the infrastructure side, but I think from the app side there's
>> also some likely culprits. Cabal's GenericPackageDescription type is
>> very large in memory. Having 10's of 1000's of these means lots of
>> memory. One hopefully easy way to save memory here without going to the
>> hassle of redoing Cabal's type definitions is simply to increase
>> sharing. There's a huge amount of repeated information. Start by sharing
>> all the package names and versions. Then there's other meta-data that
>> rarely changes between versions of the same package. This kind of thing
>> should be easy to evaluate, just write a test prog that reads the index
>> file and look at peak memory use. Then try sharing stuff and see how
>> much it drops. This sharing optimisation would still be useful even if
>> later we go and redo GenericPackageDescription to be more compact.
> This should not hold up the launch of Hackage 2 (which is very
> important) but I think it's an important issue that we need to
> address: we don't want to store the perhaps most important data the
> Haskell community has in an experimental data store! Creating a
> correct data store (i.e. ACID) that also handles a moderate amount of
> load is a quite difficult undertaking and it shouldn't be taken
> lightly. Lets stick the data in some SQL database and spend our energy
> on other things. :)
I still disagree that going with an external SQL db will be easier.
The big advantage of the acid-state (and similar) data stores is that
they let us use Haskell types properly and don't imply a separate
external data model and a marshalling stage.
That said, I also do not trust acid-state for long term storage
(simply because the binary format it uses isn't sensible) which is why
the hackage server already has a system for dumping and restoring to
standard formats (like csv, tarballs etc). So if we use this backup
system properly (ie in combination with a system for backups to other
machines) then I think there's little chance of data loss.
Additionally, the really important data (the packages) are stored in
the file system.
More information about the cabal-devel