Repository Integrations • val.meter

[data-bs-theme="dark"] main img {
  filter: invert(1);
}

val.meter is intended to support a variety of use cases, from individuals who are trying to appraise packages, to organizations trying to consistently impose a package policy, to serving repositories of package metadata.

In these examples, we’ll simulate a cohort of packages to support the testing and deployment of cohort-wide package policies - providing a representation of an end product without all the work to get there. Here we’ll build a cohort of generated packages and provide some heuristics across all of them.

Generating random packages

val.meter provides a few tools for simulation, random_pkg() and random_pkgs(), differing only in whether you want to simulate a single or collection of packages.

p <- random_pkg(permissions = TRUE)
metrics(p)
#> $r_cmd_check_error_count
#> [1] 0
#> 
#> $downloads_total
#> [1] 764
#> 
#> $dependency_count
#> [1] 0

sapply(
  random_pkgs(n = 3, permissions = TRUE),
  function(pkg) pkg$name
)
#> [1] "RprogressiveGreatness" "survivalClean"         "promotr"

Generating a repostiroy

pkg objects provide an implementation of to_dcf, allowing them to be encoded as a PACKAGES file - the same format used by repositories like CRAN to distribute a listing of packages and package data.

# igraph is required if we want to simulate sensible package dependencies
requireNamespace("igraph")

# generate some random packages
ps <- random_pkgs(n = 3, permissions = TRUE)

# output DCF files
dcf_str <- to_dcf(ps)
cat(dcf_str, "\n")
#> Package: defenderBoom
#> Version: 4.1.2
#> Depends: R
#> Imports: topogrid
#> Suggests: WebCode
#> License: Phony License
#> MD5: c9fd2ac329620afd053f92da95bdd018
#> Metric/r_cmd_check_error_count@R: 0
#> Metric/downloads_total@R: 205
#> Metric/dependency_count@R: 2
#> 
#> Package: topogrid
#> Version: 3.3.3
#> Depends: R
#> Suggests: WebCode
#> License: Phony License
#> MD5: 38d923547a32629192a596fee8304ceb
#> Metric/r_cmd_check_error_count@R: 1
#> Metric/downloads_total@R: 1209
#> Metric/dependency_count@R: 1
#> 
#> Package: WebCode
#> Version: 5.10.4
#> Depends: R
#> License: Phony License
#> MD5: 2c229b04e2f6f90fd123bd7d016184d4
#> Metric/r_cmd_check_error_count@R: 0
#> Metric/downloads_total@R: 15711
#> Metric/dependency_count@R: 1

Pulling package metadata from a package repository

What’s more, we can also reconstruct our package objects from this PACKAGES file format.

ps <- pkgs_from_dcf(dcf_str)

This exposes the package metadata through a convenient and familiar interface, allowing full access to metric metadata.

However, these new package objects differ from their original objects in a few important ways. First, their source isn’t reconstructed. However you produced the text output, we no longer have an explicit record of that process. All we know by the time we re-build our package objects is the hash of the built file from some unknown origin. Second, intermediate data is not reconstructed. This includes any logs or computational intermediate data such as full R CMD check results. All we have are the derived metrics. Finally, any errors raised by val.meter are preserved, but their call stacks are lost in this process. If you do want to preserve rich metadata, it is recommended to save full data objects for posterity.

Analysing our repository

Let’s imagine that we want to use this repository metadata to make an informed decision about packages.

We’ll start by re-simulating a larger cohort of packages so that our anlaysis produces something more interesting.

n100pkgs <- random_pkgs(n = 100, permissions = TRUE)

And just to show that we can derive this data from a representative PACKAGES file, we’ll write out and read back in our data from text format.

dcf <- to_dcf(n100pkgs)
n100pkgs <- pkgs_from_dcf(dcf)

Finally, we can take a look at how our packages fair.

# read in packages as a data.frame
df <- as.data.frame(n100pkgs)

# small helper for calculating percentiles
percentile <- function(x, ...) ecdf(x, ...)(x)

# calculate and filter on package dependency count percentiles
df$dependency_percentile <- percentile(df$dependency_count)

# find our packages with the most dependencies
df$package[df$dependency_percentile > 0.95]
#> [1] "hgplyr"          "chemo.luster"    "rbplyr"          "hnplyr"         
#> [5] "seqdata"         "diff.Affluence"  "stRaightforward" "stRaightforward"

options(scipen = 10)
plot(
  xlab = "Percently",
  x = df$dependency_percentile,
  ylab = "Dependency Count",
  y = df$dependency_count
)

An example of dependency counts distributed over their percentiles, showcasing a mostly linear character. The plot is randomly generated, so exact characteristics may vary.