[Haskell-cafe] ANN: zip-archive 0.0

Tue Aug 26 16:36:11 EDT 2008

On Mon, 2008-08-25 at 23:22 -0700, John MacFarlane wrote:
> I've written a library, zip-archive, for dealing with zip archives.

Great. I saw your query about this from a month ago.

> Haddock documentation (with links to source code):
> http://johnmacfarlane.net/zip-archive/ 
> 
> Darcs repository:
> http://johnmacfarlane.net/repos/zip-archive/
> 
> It comes with an example program that duplicates some of the
> functionality of 'zip' (configure with '-fexecutable' to build it).
> 
> I intend to put it on HackageDB, but I thought I'd get some feedback
> first. Bug reports, patches, and suggestions on the API are all welcome.

Generally it looks good, that the operations on the archive are mostly
separated from IO of writing out archives or creating entries from disk
files etc.

Looking at the API there feels to be slightly too much exposed. Eg does
the MSDOSDateTime need to be exposed, or the (de)compressData functions.

I've been reworking the tar library recently and currently have an api
that looks like:

  -- * Reading and writing the tar format
  read  :: ByteString -> Entries
  write :: [Entry] -> ByteString

  -- * Packing and unpacking files to\/from a tar archive
  pack   :: FilePath -> FilePath -> IO [Entry]
  unpack :: FilePath -> Entries  -> IO ()

Entry is like your ZipEntry. Entries is a little special. Tar is really
a linear/streamable format, we typically read the file front to back. Of
course with zip it's more complex as you have an index (right?) and you
can jump around without reading all the data.

So Entries represents the unfolding of a tar file as a sequence of
entries, but with the possibility of failure (eg format decoding
failures):

-- | A tar archive is a sequence of entries.
data Entries = Next Entry Entries
             | Done
             | Fail String

So that's why we have Entries for the result of decoding and just an
ordinary list for the input to encoding.

Zip is more complex of course because you often want to add files to
existing archives, or lookup individual entries without just iterating
through each entry.

My personal inclination is to leave off the Zip prefix in the names and
use qualified imports. I'd also leave out trivial compositions like

readZipArchive  f = toZipArchive <$> B.readFile f
writeZipArchive f = B.writeFile f . fromZipArchive

but reasonable people disagree.

For both the pack in my tar lib and your addFilesToZipArchive, there's a
getDirectoryContentsRecursive function asking to get out. This function
seems to come up often. Ideally pack/unpack and
addFilesToZipArchive/extractFilesFromZipArchive would just be mapM_
extract or create for an individual entry over the contents of the
archive or the result of a recursive traversal.

So yeah, I feel these operations ought to be simpler compositions of
other things, in your lib and mine, since this bit is often the part
where different use cases need slight variations, eg in how they write
files, or deal with os-specific permissions/security stuff. So if these
are compositions of simpler stuff it should be easier to add in extra
stuff or replace bits.

Duncan