[Haskell-cafe] Parsing cabal files to calculate average number of dependencies

Gwern Branwen gwern0 at gmail.com
Fri Jul 1 22:43:10 CEST 2011


Athas on #haskell wondered how many dependencies the average Haskell
package had. I commented that it seemed like some fairly simple
scripting to find out, and as these things tend to go, I wound up
doing a complete solution myself.

First, we get most/all of Hackage locally to examine, as tarballs:

    for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
fetch $package; done

Then we cd .cabal/packages/hackage.haskell.org

Now we can run a command which extracts the .cabal file from each
tarball to standard output:

    find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;

We could grep for 'build-depends' or something, but that gives
unreliable dirty results. (>80k items, resulting in a hard to believe
87k total deps and an average of 27 deps.) So instead, we use the
Cabal library and write a program to parse Cabal files & spit out the
dependencies, and we feed each .cabal into that:

    find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal"
-Oxf {} | runhaskell ~/deps.hs' \;

And what is deps.hs? Turns out to be surprisingly easy to parse a
String, extract the Library and Executable AST, and grab the
[Dependency] field, and then print it out (code is not particularly
clean):

import Distribution.Package
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
main :: IO ()
main = do cbl <- getContents
          let desc = parsePackageDescription cbl
          case desc of
            ParseFailed _ -> return ()
            ParseOk _ d -> putStr $ unlines $ map show $ map
(\(Dependency x _) -> x) $ extractDeps d
extractDeps :: GenericPackageDescription -> [Dependency]
extractDeps d = ldeps ++ edeps
  where ldeps = case (condLibrary d) of
                Nothing -> []
                Just c -> condTreeConstraints c
        edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d

So what are the results? (The output of one run is attached.) I get
18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
per package.

-- 
gwern
http://www.gwern.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deps.txt.gz
Type: application/x-gzip
Size: 36515 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110701/c195722d/attachment.bin>


More information about the Haskell-Cafe mailing list