tar-package was: Re: modules of cabal-install
duncan.coutts at worc.ox.ac.uk
Tue Feb 24 07:00:04 EST 2009
On Tue, 2009-02-24 at 10:34 +0100, Christian Maeder wrote:
> > Let me know what you think about the API and documentation. You mention
> > above about exporting internal data structures. As far as I can see
> > everything that is exported in the current code is needed. Let me know
> > if you think it is too much or too little.
> Ok, I think the api is too big (for a casual user). I don't want to know
> anything about the internals of an "Entry" or about a "TarPath". For
> refactoring cabal-install (using your tar package) the following
> interface was enough:
> create :: FilePath -> FilePath -> FilePath -> IO ()
> extract :: FilePath -> FilePath -> IO ()
> read :: ByteString -> Entries
> write :: [Entry] -> ByteString
> pack :: FilePath -> FilePath -> IO [Entry]
> unpack :: FilePath -> Entries -> IO ()
> data Entry
> fileName :: Entry -> FilePath
> fileContent :: Entry -> ByteString
> data Entries
> = Next Entry Entries
> | Done
> | Fail String
> Maybe only a "isNormalFile" test-function for an Entry is missing.
isNormalFile entry = fileType entry == NormalFile
No we really want one of these for every file type? isDirectory,
isHardLink. I hope not. So perhaps that means you just want to expose
fileType and FileType too in the top level API.
> checkSecurity is not needed in the API, because it is done by unpack.
> (checkTarBomb does nothing currently).
It's needed if you're checking a tar file now because you expect to
unpack it later, eg on hackage.
> Tar entries should (usually will) not be constructed by the user.
I've got a use case where we do.
> getDirectoryContentsRecursive does not really belong into this tar package.
True but it's so useful you'd have to implement it yourself if it was
not provided. We can't get it into the System.Directory package very
> I would be happy, if the existence of TarPath (and all the other funny
> entry fields) could be hidden from the user.
You'll have to trade that for some cases where you get error instead.
Translating file paths to tar paths does not work in all cases. We can
hide that in an IO error for functions that are in IO, like pack.
> Manipulating Entries is also not a typical user task. (Maybe the type
> Entries should just be "[Either String Entry]", but the given type is
> fine, as it only allows a final failure string)
Yeah I think only one failure is good. Manipulating entries is more
common than you think, if you are working with a tar file in memory and
never extracting files to disk. For example the hackage-server both
constructs and dissects tar files but never packs or unpacks them.
> So rather than re-exporting almost everything from the other modules in
> the top module, I suggest my API above and simply expose all other
> modules in case some wants the internals.
Ok, so perhaps we should split the API into two, in the way you suggest
above. I'll try that and see how it looks.
> > Currently I get round-trip byte-for-byte compatibility with about 50% of
> > the .tar.gz packages on my system (I'm on gentoo so there's lots of
> > those). The ones that are not byte-for-byte equal after reading/writing
> > are still readable by other tools (and probably normalised and closer to
> > standard compliant) but it needs investigating in more detail.
> > The checking API is incomplete (security, tarbombs, portability) and
> > there are no tests for the lazy streaming properties yet (ie that we can
> > process arbitrary large archives in constant space).
> I can only suggest to release it soon, use it for cabal-install and make
> a new release of cabal-install for ghc-6.10.2
I'll not be using the tar package for cabal-install immediately. Keeping
the number of dependencies low while we work out installation and
packaging issues is fairly important. People already complain that
cabal-install has any dependencies at all. Once it's bundled in the
platform then that restriction will be lifted.
> P.S. I could (darcs) send you my (humble) changes to cabal-install and tar
More information about the Libraries