[Haskell] Announcing harchive: Backup and restore software in Haskell.

Sun Mar 4 23:40:13 EST 2007

Announcing release 0.1 of "harchive", a program for backing up and
restoring data.  I've included a brief feature list below.  The code
is available in darcs at:

  http://www.davidb.org/darcs/harchive/

The connection isn't all that fast, so if it gets too busy, I'll move
it somewhere else.

This software is in a very early stage, but is at the point where
others may be interested in looking at it.  It does demonstrate that
Haskell (at least GHC) is indeed useful for this kind of low-level
seeming task.

Thanks,
David Brown

Harchive version 0.1.

  - Implemented, with support for the following.

    - Client/server model.  The backup is stored in a file pool with
      the hpool program.  The hfile program can access this pool over
      tcp (no authentication or encryption, so be careful).

    - Stores data from multiple backups and multiple machines in a
      content addressible store.  Duplicated data even on separate
      machines will not take additional space.  Collisions can be made
      arbitrarily improbable by choice of hash function size (not
      easily changeable, in the current code, though).

    - Pool manager uses Sqlite3 specific capabilities to get efficient
      storage of the pool index.  Sqlite3 is a custom binding to
      Sqlite3 to take advantage of these capabilities.

    - Uses openssl's sha1 library.  Wouldn't be difficult to use a
      different library (there are license issues with openssl).
      Generally the program's performance is bound by the speed of
      hashing and/or the speed of data compression.

    - Uses Duncan Coutts' zlib library to get good zlib speed.

    - Linux dependent.  Uses the output of '/sbin/blkid' to map
      devices to UUIDs of the filesystems to get persistent, and
      unique identifiers for each filesystem.

    - Has a primitive status display ('-v') during backup.

    - Able to backup and restore directories/filesystems.  Restore
      semantics are as accurate as I can get them without extra
      strange semantics.

    - Multithreaded backup.  Allows backup to run at high speed, even
      while waiting for cache responses from the server.  Does not
      need to be built with '-threaded'.

    - Restore is reasonably simple.  It tends to get tested less, so
      it is beneficial to move complexity to the backup side.