[Haskell-cafe] ANNOUNCE: tar

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Mon Mar 2 11:15:02 EST 2009

On Mon, 2009-03-02 at 08:20 -0600, John Goerzen wrote:
> Duncan Coutts wrote:
> > All,
> > 
> > I'm pleased to announce a major new release of the tar package for
> > handling ".tar" archive files.
> Very nice!
> I'm curious -- what specific variants of the tar format can it read and
> write?

It can read and write basic Unix V7 format, POSIX ustar and gnu formats.

>  * PAX?

PAX is a compatible extension of Posix ustar. It just standardises some
extra tar entry types ('x' and 'g'). These archives can be read and
written but there is no special support for them. You would match on
entryContents = OtherEntryType 'x' paxHeader _ -> and then parse the
paxHeader which is a utf-8 file containing name/value pairs.

>  * GNU tar sparse files?

No support. They'll get matched as OtherEntryType 'S'. However unlike
PAX, GNU sparse format puts the sparse info directly in the tar header
and that is not parsed and the lib does not provide direct access to the
data for you to be able to do it yourself.

>  * POSIX ustar

This is the standard format the library generates.

>  * various pre-posix archives?

Yes, at least the basic data from the V7 format. Data in the top half of
the header is ignored.

>  * Solaris tar?

In so far as it is standard posix ustar yes. Again it uses some extra
entry types like 'X' for extended info. These are preserved and you can
access them but there is no special support for parsing the body of
these entry types.

>  * Binary and text numbers in numeric fields?

Only text at the moment. Binary ones are currently recognised and
rejected with an error saying binary ones are not supported. Adding
support would not be terribly hard. Patches gladly accepted. I've only
found one tarball that uses it (generated by the perl Archive::Tar lib).

My main use case so far for the library is for software distribution
with .tar.gz files, where portability is important. So I've tested with
all the .tar.gz and .tar.bz files I could get my hands on (quite a few
on a gentoo system). I've not looked at or tested use cases like backups
where important things include large file support, preserving
permissions, sparse files etc. I've tried the star program's tar torture
tests, but this should be automated into a testsuite.

I've done no performance tuning except to check that it works in
constant space. On a cached 97m tarball (glibc) the timings on my
machine are:

GNU tar uncompressed:
$ time tar -tf glibc-2.6.1.tar > /dev/null

real	0m0.126s
user	0m0.052s
sys	0m0.068s

Haskell tar uncompressed:
$ time htar -tf glibc-2.6.1.tar > /dev/null

real	0m0.617s
user	0m0.572s
sys	0m0.040s

GNU tar compressed:
$ time tar -tzf glibc-2.6.1.tar.gz > /dev/null

real 0m0.938s
user 0m0.880s
sys 0m0.056s

Haskell tar compressed:
$ time htar -tzf glibc-2.6.1.tar.gz > /dev/null

real	0m1.207s
user	0m1.188s
sys	0m0.016s

So it's slower but still perfectly good.


More information about the Libraries mailing list