[Haskell-cafe] Parsing cabal files to calculate average number of dependencies

Rogan Creswick creswick at gmail.com
Fri Jul 1 23:23:13 CEST 2011


On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen <gwern0 at gmail.com> wrote:
> Athas on #haskell wondered how many dependencies the average Haskell
> package had. I commented that it seemed like some fairly simple
> scripting to find out, and as these things tend to go, I wound up
> doing a complete solution myself.
>
> First, we get most/all of Hackage locally to examine, as tarballs:
>
>    for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
> fetch $package; done

I think the index tarball has all the info you need, and would be
faster to retrieve / process, if you or anyone else needs to get the
.cabal files again:

http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb)

The set of the latest package sdists is also available:

http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb)

--Rogan

> Then we cd .cabal/packages/hackage.haskell.org
>
> Now we can run a command which extracts the .cabal file from each
> tarball to standard output:
>
>    find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;
>
> We could grep for 'build-depends' or something, but that gives
> unreliable dirty results. (>80k items, resulting in a hard to believe
> 87k total deps and an average of 27 deps.) So instead, we use the
> Cabal library and write a program to parse Cabal files & spit out the
> dependencies, and we feed each .cabal into that:
>
>    find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal"
> -Oxf {} | runhaskell ~/deps.hs' \;
>
> And what is deps.hs? Turns out to be surprisingly easy to parse a
> String, extract the Library and Executable AST, and grab the
> [Dependency] field, and then print it out (code is not particularly
> clean):
>
> import Distribution.Package
> import Distribution.PackageDescription
> import Distribution.PackageDescription.Parse
> main :: IO ()
> main = do cbl <- getContents
>          let desc = parsePackageDescription cbl
>          case desc of
>            ParseFailed _ -> return ()
>            ParseOk _ d -> putStr $ unlines $ map show $ map
> (\(Dependency x _) -> x) $ extractDeps d
> extractDeps :: GenericPackageDescription -> [Dependency]
> extractDeps d = ldeps ++ edeps
>  where ldeps = case (condLibrary d) of
>                Nothing -> []
>                Just c -> condTreeConstraints c
>        edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d
>
> So what are the results? (The output of one run is attached.) I get
> 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
> per package.
>
> --
> gwern
> http://www.gwern.net
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>



More information about the Haskell-Cafe mailing list