tar-package was: Re: modules of cabal-install

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue Feb 24 07:20:33 EST 2009

On Tue, 2009-02-24 at 12:00 +0000, Duncan Coutts wrote:
> On Tue, 2009-02-24 at 10:34 +0100, Christian Maeder wrote:

> > checkSecurity is not needed in the API, because it is done by unpack.
> > (checkTarBomb does nothing currently).
> It's needed if you're checking a tar file now because you expect to
> unpack it later, eg on hackage.
> > Tar entries should (usually will) not be constructed by the user.
> I've got a use case where we do.

So I should say what these use cases are. One is cabal-install of
course. That's fairly easy it just needs to create and extract tar

The other case is hackage. For that we want to upload and check the
contents of tar files, without ever unpacking them to local files. We
want to check the tar file itself to make sure it is a portable format
(ie not containing any funky extensions that not all tar readers will
grok) or things that are not portable between platforms, like file names
that would be invalid on Windows. We also want to extract a single file
in memory (the .cabal file).

Another case within hackage is constructing the 00-index.tar.gz file.
That is built in memory from an another internal representation of the
package index. For that we really are constructing each entry ourselves,
supplying all the appropriate info including file modification time,
ownership etc.

A final case in hackage is serving the contents of .tar files. This is
to let users browse the contents of packages, eg to read the README
without having to download the whole .tar.gz. Should also make all the
code more easily googleable. It's also the method we want to use for
serving haddock docs. Bots or package owners will upload .tar.gz bundles
of documentation and we'll serve the contents directly without unpacking
them. We'll do that by gunziping and storing the .tar file on disk,
scanning it once to generate a file name -> (offset, length) index and
then when we service a request we open the .tar file seek to the offset
and return that length.

I think that's it.


More information about the Libraries mailing list