val.meter
is intended to support a variety of use cases,
from individuals who are trying to appraise packages, to organizations
trying to consistently impose a package policy, to serving repositories
of package metadata.
In these examples, we’ll simulate a cohort of packages to support the testing and deployment of cohort-wide package policies - providing a representation of an end product without all the work to get there. Here we’ll build a cohort of generated packages and provide some heuristics across all of them.
Generating random packages
val.meter
provides a few tools for simulation,
random_pkg()
and random_pkgs()
, differing only
in whether you want to simulate a single or collection of packages.
p <- random_pkg(permissions = TRUE)
metrics(p)
#> $r_cmd_check_error_count
#> [1] 0
#>
#> $downloads_total
#> [1] 764
#>
#> $dependency_count
#> [1] 0
sapply(
random_pkgs(n = 3, permissions = TRUE),
function(pkg) pkg$name
)
#> [1] "RprogressiveGreatness" "survivalClean" "promotr"
Generating a repostiroy
pkg
objects provide an implementation of
to_dcf
, allowing them to be encoded as a
PACKAGES
file - the same format used by repositories like
CRAN to distribute a listing of packages and package data.
# igraph is required if we want to simulate sensible package dependencies
requireNamespace("igraph")
# generate some random packages
ps <- random_pkgs(n = 3, permissions = TRUE)
# output DCF files
dcf_str <- to_dcf(ps)
cat(dcf_str, "\n")
#> Package: defenderBoom
#> Version: 4.1.2
#> Depends: R
#> Imports: topogrid
#> Suggests: WebCode
#> License: Phony License
#> MD5: c9fd2ac329620afd053f92da95bdd018
#> Metric/r_cmd_check_error_count@R: 0
#> Metric/downloads_total@R: 205
#> Metric/dependency_count@R: 2
#>
#> Package: topogrid
#> Version: 3.3.3
#> Depends: R
#> Suggests: WebCode
#> License: Phony License
#> MD5: 38d923547a32629192a596fee8304ceb
#> Metric/r_cmd_check_error_count@R: 1
#> Metric/downloads_total@R: 1209
#> Metric/dependency_count@R: 1
#>
#> Package: WebCode
#> Version: 5.10.4
#> Depends: R
#> License: Phony License
#> MD5: 2c229b04e2f6f90fd123bd7d016184d4
#> Metric/r_cmd_check_error_count@R: 0
#> Metric/downloads_total@R: 15711
#> Metric/dependency_count@R: 1
Pulling package metadata from a package repository
What’s more, we can also reconstruct our package objects from this
PACKAGES
file format.
ps <- pkgs_from_dcf(dcf_str)
This exposes the package metadata through a convenient and familiar interface, allowing full access to metric metadata.
However, these new package objects differ from their original objects
in a few important ways. First, their source isn’t reconstructed.
However you produced the text output, we no longer have an explicit
record of that process. All we know by the time we re-build our package
objects is the hash of the built file from some unknown origin. Second,
intermediate data is not reconstructed. This includes any logs or
computational intermediate data such as full R CMD check
results. All we have are the derived metrics. Finally, any errors raised
by val.meter
are preserved, but their call stacks are lost
in this process. If you do want to preserve rich metadata, it is
recommended to save full data objects for posterity.
Analysing our repository
Let’s imagine that we want to use this repository metadata to make an informed decision about packages.
We’ll start by re-simulating a larger cohort of packages so that our anlaysis produces something more interesting.
n100pkgs <- random_pkgs(n = 100, permissions = TRUE)
And just to show that we can derive this data from a representative
PACKAGES
file, we’ll write out and read back in our data
from text format.
dcf <- to_dcf(n100pkgs)
n100pkgs <- pkgs_from_dcf(dcf)
Finally, we can take a look at how our packages fair.
# read in packages as a data.frame
df <- as.data.frame(n100pkgs)
# small helper for calculating percentiles
percentile <- function(x, ...) ecdf(x, ...)(x)
# calculate and filter on package dependency count percentiles
df$dependency_percentile <- percentile(df$dependency_count)
# find our packages with the most dependencies
df$package[df$dependency_percentile > 0.95]
#> [1] "hgplyr" "chemo.luster" "rbplyr" "hnplyr"
#> [5] "seqdata" "diff.Affluence" "stRaightforward" "stRaightforward"
options(scipen = 10)
plot(
xlab = "Percently",
x = df$dependency_percentile,
ylab = "Dependency Count",
y = df$dependency_count
)