tar-package was: Re: modules of cabal-install

Christian Maeder Christian.Maeder at dfki.de
Tue Feb 24 04:34:23 EST 2009


Duncan Coutts wrote:
[...]
> Tar.unpack dir . Tar.read . GZip.decompress =<< BS.readFile tar
> 
> or
> 
> BS.writeFile tar . GZip.compress . Tar.write =<< Tar.pack base dir
[...]
>> The sources in cabal-install seem most up-to-date (because of
>> cabal-install-0.6.2) and it would make sense to take this sources and
>> replace those in the tar-package.
> 
> Yes, that's what I was doing over the weekend.

thanks a lot!

> darcs get http://code.haskell.org/tar/
> 
> Let me know what you think about the API and documentation. You mention
> above about exporting internal data structures. As far as I can see
> everything that is exported in the current code is needed. Let me know
> if you think it is too much or too little.

Ok, I think the api is too big (for a casual user). I don't want to know
anything about the internals of an "Entry" or about a "TarPath". For
refactoring cabal-install (using your tar package) the following
interface was enough:

create :: FilePath -> FilePath -> FilePath -> IO ()
extract :: FilePath -> FilePath -> IO ()
read :: ByteString -> Entries
write :: [Entry] -> ByteString
pack :: FilePath -> FilePath -> IO [Entry]
unpack :: FilePath -> Entries -> IO ()
data Entry
fileName :: Entry -> FilePath
fileContent :: Entry -> ByteString
data Entries
 = Next Entry Entries
 | Done
 | Fail String

Maybe only a "isNormalFile" test-function for an Entry is missing.

checkSecurity is not needed in the API, because it is done by unpack.
(checkTarBomb does nothing currently).

Tar entries should (usually will) not be constructed by the user.

getDirectoryContentsRecursive does not really belong into this tar package.

I would be happy, if the existence of TarPath (and all the other funny
entry fields) could be hidden from the user.

Manipulating Entries is also not a typical user task. (Maybe the type
Entries should just be "[Either String Entry]", but the given type is
fine, as it only allows a final failure string)

So rather than re-exporting almost everything from the other modules in
the top module, I suggest my API above and simply expose all other
modules in case some wants the internals.

> Currently I get round-trip byte-for-byte compatibility with about 50% of
> the .tar.gz packages on my system (I'm on gentoo so there's lots of
> those). The ones that are not byte-for-byte equal after reading/writing
> are still readable by other tools (and probably normalised and closer to
> standard compliant) but it needs investigating in more detail.
> 
> The checking API is incomplete (security, tarbombs, portability) and
> there are no tests for the lazy streaming properties yet (ie that we can
> process arbitrary large archives in constant space).

I can only suggest to release it soon, use it for cabal-install and make
a new release of cabal-install for ghc-6.10.2

Thank you, Duncan

Christian

P.S. I could (darcs) send you my (humble) changes to cabal-install and tar


More information about the Libraries mailing list