[Haskell-cafe] Is there some tool for inferring wide dependency ranges?

Thu May 27 17:30:08 UTC 2021

I (kind of) have such tool, it's not feasible for continuous integration
usage.

For an example let consider lens-package. It has some dependencies,
plenty for
manual check to be infeasible, but not extremely many.

My tool has a mode where it tries to find an /effective/ lower bound
constrained by present `build-depends` constraints. And additional flag, to
build-verify it.  I.e. this is not even a proper search.

An algorithm is simple:
foreach dependency
  check which existing dependency versions are in range
    splitting the set (with some heuristics considering major versions etc).
    figure out the effective lower bound for each dependency.

Note: the lower bound verification is "complete" in per-dependency approach,
but full range verification is more expensive due combinatorical explosion.

Another note: if yourr lower bound is wrong you have to figure out why by
your own. Looking at the error and scanning through changelogs has often
proven to be faster approach (humans are good at combining information).

And still, this "O(dependecies x versions in average range)" algorithm
is slow
if we decide to build/verify the found bounds.

I started with somewhat outdated cabal nix-style cache, i.e. I haven't run
this command previously.  (When just working on stuff cabal picks ~latest
versions, i.e. skewed dependency install plans as we need here are not
present in the cache).

After the run we get a nice output (for current `lens` master), which
has colors
in the terminal telling some information beyond just versions.

                                   8.0.2     8.2.2     8.4.4    
8.6.5     8.8.4     8.10.4    9.0.1
    array                          0.5.1.1   0.5.2.0   0.5.2.0  
0.5.3.0   0.5.4.0   0.5.4.0   0.5.4.0
    assoc                          1.0.2     1.0.2     1.0.2    
1.0.2     1.0.2     1.0.2     1.0.2
    base                           4.9.1.0   4.10.1.0  4.11.1.0 
4.12.0.0  4.13.0.0  4.14.1.0  4.15.0.0
    base-orphans                   0.5.2     0.5.4     0.5.4    
0.5.4     0.8.1     0.8.1     0.8.1
    bifunctors                     5.5.7     5.5.7     5.5.7    
5.5.7     5.5.7     5.5.7     5.5.8
    bytestring                     0.10.6.0  0.10.6.0  0.10.8.0 
0.10.8.0  0.10.8.1  0.10.8.1  0.10.9.0
    call-stack                     0.1.0     0.1.0     0.1.0    
0.1.0     0.1.0     0.1.0     0.1.0
    comonad                        5.0.7     5.0.7     5.0.7    
5.0.7     5.0.7     5.0.7     5.0.7
    containers                     0.5.7.0   0.5.7.0   0.5.7.0  
0.5.7.0   0.5.7.0   0.5.7.0   0.5.7.0
    contravariant                  1.4       1.4       1.4      
1.5       1.5       1.5       1.5
    distributive                   0.5.1     0.5.2     0.5.3    
0.5.3     0.5.3     0.5.3     0.5.3
    exceptions                     0.8.2.1   0.8.3     0.8.3    
0.10.0    0.10.2    0.10.4    0.10.4
    filepath                       1.2.0.0   1.2.0.0   1.2.0.0  
1.2.0.0   1.2.0.0   1.2.0.0   1.2.0.0
    free                           5.1.5     5.1.5     5.1.5    
5.1.5     5.1.5     5.1.5     5.1.5
    ghc-prim                       0.5.0.0   0.5.1.1   0.5.2.0  
0.5.3     0.5.3     0.6.1     0.7.0
    hashable                       1.2.7.0   1.2.7.0   1.2.7.0  
1.2.7.0   1.3.0.0   1.3.0.0   1.3.1.0
    indexed-traversable            0.1.1     0.1.1     0.1.1    
0.1.1     0.1.1     0.1.1     0.1.1
    indexed-traversable-instances  0.1       0.1       0.1      
0.1       0.1       0.1       0.1
    kan-extensions                 5.1       5.1       5.1      
5.1       5.1       5.1       5.1
    mtl                            2.2.1     2.2.1     2.2.1    
2.2.1     2.2.1     2.2.1     2.2.1
    parallel                       3.2.1.0   3.2.1.1   3.2.1.1  
3.2.2.0   3.2.2.0   3.2.2.0   3.2.2.0
    profunctors                    5.5.2     5.5.2     5.5.2    
5.5.2     5.5.2     5.5.2     5.6
    reflection                     2.1       2.1       2.1.3    
2.1.4     2.1.4     2.1.4     2.1.4
    semigroupoids                  5.0.1     5.2       5.2.2    
5.2.2     5.3.2     5.3.2     5.3.4
    strict                         0.4       0.4       0.4      
0.4       0.4       0.4       0.4
    tagged                         0.8.6     0.8.6     0.8.6    
0.8.6     0.8.6     0.8.6     0.8.6
    template-haskell               2.11.1.0  2.12.0.0  2.13.0.0 
2.14.0.0  2.15.0.0  2.16.0.0  2.17.0.0
    text                           1.2.3.0   1.2.3.0   1.2.3.0  
1.2.3.0   1.2.4.0   1.2.3.2   1.2.4.1
    th-abstraction                 0.4.1.0   0.4.1.0   0.4.1.0  
0.4.1.0   0.4.1.0   0.4.1.0   0.4.1.0
    these                          1.1.1.1   1.1.1.1   1.1.1.1  
1.1.1.1   1.1.1.1   1.1.1.1   1.1.1.1
    transformers                   0.5.0.0   0.5.0.0   0.5.5.0  
0.5.5.2   0.5.6.0   0.5.6.0   0.5.6.0
    transformers-compat            0.5.0.4   0.5.0.4   0.5.0.4  
0.5.0.4   0.5.0.4   0.5.0.4   0.5.0.4
    unordered-containers           0.2.10.0  0.2.10.0  0.2.10.0 
0.2.10.0  0.2.10.0  0.2.10.0  0.2.10.0
    vector                         0.12.1.2  0.12.1.2  0.12.1.2 
0.12.1.2  0.12.1.2  0.12.1.2  0.12.2.0

... after two hours of wall time, or a day of CPU time.

            dry        dep         build
    counts  1147       238         237
    q50     3.932s     84.624s     166.224s
    q90     5.223s     251.763s    269.377s
    total   4583.699s  25147.444s  43101.014s
    usr: 80390.500s   sys: 6018.400s   cpu: 1204.249%   time: 7175.338s

Note that solving alone takes hour and a half of CPU time,
fortunately that is highly parallelisable task.

Building dependencies takes plenty of time,
there I have only single GHC building dependencies,
to avoid rebuilding the same ones and avoding store corruption
(which shouldn't happen, but happens).

Finally the building of package itself in various combinations
is again parallizable, as store is only read.
For `lens` that has considerable cost on its own.
(After all the package is built 7 times 30 times).

---

Funny coincidence is that kan-extensions-5.1 had wrong upper bound on
base/GHC, as it doesn't build with it.

For that my tool has a differnt mode of operations for building multiple
package versions at once, a bit like matrix.hackage.haskell.org, but
locally.
That's what I use to locally test that mass revision edits are ok. We get

                      9.2.0.20210422  9.0.1  8.10.4  8.8.4  8.6.5 
8.4.4  8.2.2  8.0.2  7.10.3  7.8.4  7.6.3  7.4.2  7.2.2  7.0.4
kan-extensions-5.2.2  NO-IP           OK     OK      OK     OK    
OK     OK     OK     OK      OK     OK     OK     NO-IP  NO-IP
kan-extensions-5.2.1  NO-IP           OK     OK      OK     OK    
OK     OK     OK     OK      OK     OK     OK     NO-IP  NO-IP
kan-extensions-5.2    NO-IP           NO-IP  OK      OK     OK    
OK     OK     OK     OK      OK     OK     OK     NO-IP  NO-IP
kan-extensions-5.1    NO-IP           FAIL   OK      OK     OK    
OK     OK     OK     OK      OK     OK     OK     NO-IP  NO-IP
kan-extensions-5.0.2  NO-IP           NO-IP  NO-IP   NO-IP  NO-IP 
NO-IP  OK     OK     OK      OK     OK     OK     NO-IP  NO-IP

That operation **is fast**, as dependencies are reused well,
so there is no cold cabal nix-style store cache issues.

            dry       dep      build
    counts  70        48       48
    q50     0.707s    0.128s   0.632s
    q90     4.844s    0.169s   14.290s
    total   163.226s  31.414s  184.551s
    usr: 342.020s   sys: 37.060s   cpu: 905.090%   time: 41.883s

Under a minute of wall clock.
That hopefully highlights why `matrix.hackage.haskell.org` is feasible, but
"bounds check" isn't. The difference in need computational resources is
huge.

So after making a revision to turn FAIL into NO-IP (no install plan),
using Hackage Trustees'
super powers, and we can return to `lens`.

The result is the same, except the minimum kan-extensions version
on GHC-9.0 is 5.2.1

This time the run is significantly faster:

             dry        dep      build
     counts  1148       238      238
     q50     6.278s     0.210s   1.000s
     q90     7.225s     0.284s   1.268s
     total   6937.215s  69.420s  389.536s
     usr: 6825.120s   sys: 554.600s   cpu: 2163.272%   time: 341.137s

but still two hours of CPU time.  (I hope solving could be a lot faster if
cabal-install-solver were available as usable library, but we are quite far
from that).  Also note that cpu usage is 2200%, it's a very good idea to
have
many cores for this kind of work.

Here we benefit from local cache of dependencies, but also the local
`dist-newstyle/` directories previous run left behind.  As you can see,
build-step is quite fast.  The previous many hour work would need to be
repeated, if we would make a change to `lens` itself (especially in a
way such
incremental build is not very fast).

There is 11GB of build products i.e. these 7 x 30 versions of lens. 
That is a
thing ot be aware of too.

In summary, if you try something like this on a GitHub-hosted GitHub Actions
runner
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners

    Hardware specification for Windows and Linux virtual machines:

    2-core CPU
    7 GB of RAM memory
    14 GB of SSD disk space

... it just won't work.

---

Do I have suggestions what to do instead?

For example haskell-ci allows to specify constraint sets,
and I have one for example in quickcheck-instances,
https://github.com/haskellari/qc-instances/blob/master/cabal.haskell-ci

That is a (rare) package where I do care about having as wide dependency
range as possible, as I don't want to run into problems due it.

That cost is just one additional build per CI run (note: also dependencies).

Honestly, I don't remember how I found those versions, whether using my tool
or by trial-an-error.  As you can see from the history of that file, I don't
need to edit them often.  (last modification is August 2020).

For ordinary packages, I'd just suggest using picking the current
versions as
lower bounds (e.g. what gen-bounds suggests), and sticking to them until
there
is a reason to not too.  Don't spend too much time pushing them down from
what's currently possible, if there isn't real reason.  You may pick older
Stackage LTS snapshot (for the oldest GHC you want to support) for some
inspiration, though I often don't do that, as dependency versions tend to be
quite old.  For example there is no value in allowing older QuickCheck
versions in your test-suite, it's unluckily they will find you bugs that
newer
versions won't.

- Oleg

On 27.5.2021 16.07, Askar Safin via Haskell-Cafe wrote:
> Hi. Is there some tool for inferring dependency ranges for my cabal packages? I know about "cabal gen-bounds", but this command gives too short ranges. I want to get as wide range as possible. I. e. I want tool, which tries to build and test my package on various versions of dependencies, and reports resulting range. Successful testing should mean that the version of dependency is compatible. That tool should work also on "base" package, this means that it should test various ghc versions, too.
>
> If there is no such tool, then at least I want some tool, which will check already specified dependency range. Such tool should do this:
> - Extract dependencies from cabal file
> - Install smallest versions of specified dependencies (this will include "base" and thus ghc)
> - Test on this versions
> - Install largest versions
> - Test on this versions
>
> I will put invocation of such tool to continuous integration.
>
> I hope there are a lot of package maintainers here. How do you manage dependencies?
>
> ==
> Askar Safin
> http://safinaskar.com
> https://sr.ht/~safinaskar
> https://github.com/safinaskar
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.