[Haskell-cafe] Parsing cabal files to calculate average number of dependencies
Gwern Branwen
gwern0 at gmail.com
Fri Jul 1 22:43:10 CEST 2011
Athas on #haskell wondered how many dependencies the average Haskell
package had. I commented that it seemed like some fairly simple
scripting to find out, and as these things tend to go, I wound up
doing a complete solution myself.
First, we get most/all of Hackage locally to examine, as tarballs:
for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
fetch $package; done
Then we cd .cabal/packages/hackage.haskell.org
Now we can run a command which extracts the .cabal file from each
tarball to standard output:
find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \;
We could grep for 'build-depends' or something, but that gives
unreliable dirty results. (>80k items, resulting in a hard to believe
87k total deps and an average of 27 deps.) So instead, we use the
Cabal library and write a program to parse Cabal files & spit out the
dependencies, and we feed each .cabal into that:
find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal"
-Oxf {} | runhaskell ~/deps.hs' \;
And what is deps.hs? Turns out to be surprisingly easy to parse a
String, extract the Library and Executable AST, and grab the
[Dependency] field, and then print it out (code is not particularly
clean):
import Distribution.Package
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
main :: IO ()
main = do cbl <- getContents
let desc = parsePackageDescription cbl
case desc of
ParseFailed _ -> return ()
ParseOk _ d -> putStr $ unlines $ map show $ map
(\(Dependency x _) -> x) $ extractDeps d
extractDeps :: GenericPackageDescription -> [Dependency]
extractDeps d = ldeps ++ edeps
where ldeps = case (condLibrary d) of
Nothing -> []
Just c -> condTreeConstraints c
edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d
So what are the results? (The output of one run is attached.) I get
18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
per package.
--
gwern
http://www.gwern.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deps.txt.gz
Type: application/x-gzip
Size: 36515 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110701/c195722d/attachment.bin>
More information about the Haskell-Cafe
mailing list