riskmetric provides a workflow to evaluate the quality of a set of R packages that involves four major steps. The workflow can help users to choose high quality R packages, improve package reliability and prove the validity of R packages in a regulated industry. The four steps include:
Before we get started assessing packages, we need a place to aggregate all of our package metadata. To handle this, we use the
pkg_ref class. As package metadata is requested, it will be housed within this object for reuse, avoiding repeatedly having to scrape, download, parse or derive the same metadata multiple times.
#> <pkg_install, pkg_ref> riskmetric v0.1.0.9000 #> $path #>  "/home/user/username/R/3.6/Resources/library/riskmetric" #> $source #>  "pkg_install" #> $version #>  '0.1.0.9000' #> $name #>  "riskmetric" #> $description... #> $help... #> $help_aliases... #> $news...
A package reference defines one of many different ways of gathering information about a package and its development practices. The targets of this reference could be a source code directory, installation in a package library, or simply a url in a web-hosted repository from which metadata can be scraped. Each piece of metadata is captured through a callback function, which can be uniquely handled for each source of package information.
pkg_ref objects can be coerced to a
tibble for easier, parallel assessment of packages.
#> # A tibble: 3 x 3 #> package version pkg_ref #> <chr> <chr> <list<pkg_ref>> #> 1 riskmetric 0.1.0.9000 riskmetric<install> #> 2 utils 3.6.1 utils<install> #> 3 tools 3.6.1 tools<install>
riskmetric, the core operations of package validation are handled in a series of verbs. Once package reference objects have been defined, they can be processed by chaining these foundational operations. These verbs, as well as many of the features of package assessment, are designed to be highly extensible to make it easier for package validation to be a community effort.
extending-riskmetric vignette for more details
We’ll use the
tibble from above as our example
The first component of the validation process is an assessment. This is used for iterating over assessment functions which serve to fetch any necessary package metadata and produce an atomic value reflective of each assessment. For example, when assessing whether a package’s NEWS files are up-to-date, it may return a logical vector indicating which NEWS files have entries for the current package version.
assess function iterates over a list of assessment functions, adding a column per assessment. By default, it will use all the available assessment functions (all functions in the
riskmetric exports beginning with
assess_*). Once applied, an assessment produces a
package_tbl %>% assess() #> # A tibble: 3 x 6 #> package version pkg_ref has_news export_help news_current #> <chr> <chr> <list<pkg_ref>> <list<pkg_m> <list<pkg_me> <list<pkg_me> #> 1 riskmet… 0.1.0.9… riskmetric<install> 0 <lgl > <lgl > #> 2 utils 3.6.1 utils<install> 0 <lgl > <lgl > #> 3 tools 3.6.1 tools<install> 0 <lgl > <lgl >
assess will use all available
riskmetric assessment functions by default. A subset of functions, or additional user-defined functions, can be used by passing a list of functions to the
After gathering the available metric metadata, every metric object is scored. Scoring a metric translates the assessment atomic values into a single numeric score. For example, we might score the number of available NEWS files by giving a package a perfect score (
1.0) only when all NEWS files have been updated for the latest version, or alternatively we might score it as the fraction of NEWS files that are up-to-date.
package_tbl %>% assess() %>% score() #> # A tibble: 3 x 6 #> package version pkg_ref has_news export_help news_current #> <chr> <chr> <list<pkg_ref>> <dbl> <dbl> <dbl> #> 1 riskmetric 0.1.0.9000 riskmetric<install> 0 1 0 #> 2 utils 3.6.1 utils<install> 0 0.995 0 #> 3 tools 3.6.1 tools<install> 0 1 0
Keeping the scoring separated from the assessment allows us to easily iterate on how scores are defined and allows for the scoring function to be easily overwritten.
What we ultimately want is a single numeric value which indicates the “risk” involved with using a given package. For the
dplyr-savvy, this will look quite familiar. We’ve defined a default summarizing function
summarize_risk. It accepts the full data.frame and performs a default risk assessment assuming all the available
riskmetric assessments have been performed.
package_tbl %>% assess() %>% score() %>% mutate(risk = summarize_risk(.)) #> # A tibble: 3 x 7 #> package version pkg_ref has_news export_help news_current risk #> <chr> <chr> <list<pkg_ref>> <dbl> <dbl> <dbl> <dbl> #> 1 riskmetr… 0.1.0.9… riskmetric<install> 0 1 0 0.6 #> 2 utils 3.6.1 utils<install> 0 0.995 0 0.602 #> 3 tools 3.6.1 tools<install> 0 1 0 0.6
As you can see, the package is currently quite bare-bones and nobody would reasonably choose packages based solely on the existence of a NEWS file.
Our priority so far has been to set up an extensible framework as the foundation for a community effort, and that’s where you come in! There are a few things you can do to get started.
extending-riskmetricvignette to see how to extend the functionality with your own metrics
riskmetricGitHub where we can further discuss new metric proposals