Supporting Companies With Validation of R Packages: A Regulatory Repository

Coline Zeballos Roche

Yann Féat mainanalytics

useR! 2024, July 9th

A Universal Conundrum

There are n packages for x, which one is the best?1

A Universal Conundrum

By choosing packages, we’re choosing our 1

  • Feature set
  • Dependency footprint
  • Integration with other packages
  • Preferred lifecycle management of our tools
  • Community that we can lean on for help

A Universal Conundrum

Regulated Industries: Justification as a Requirement

Goals

We have two objectives:

  • Generation of risk-based quality indicators

Provide a community-maintained catalog of package quality indicators (“risk metrics”)

  • Package recommendations

Serve subsets of packages that conform to a specified risk tolerance

An evolving R ecosystem

In close communication with many beloved R projects

Submissions Working Group

Repositories Working Group

pharmaverse

targetting repos integration

r-lib/pak

targetting pak integration

Pilot Implementation

focus on proving capabilities, quick development

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Pre-calculated {riskmetric} scores
{riskscore}
PACKAGES
Manually Join Data
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")

Interacting with the repo

Packages risk filters

  • Helper package for system administrators
  • Restricts packages available for installation to those fitting a policy
  • Uses packages metadata in the repo
  • May be used together with manual checks (e.g., read a statistical review)

flowchart TD
  A[All packages] --> B{Code\n covr.\n > 95%?}
  B -- Yes --> C{Has\n doc.?}
  C -- Yes --> D(Available for safety-critical activities)

Usage

Unfiltered

available.packages()
Package
1 colorspace
2 farver
3 isoband
106 tripack

Filtered

fltr <- risk_filter(covr_coverage > 0.95
  & has_vignettes)
options(available_packages_filters = fltr)
available.packages()
Package
1 colorspace
2 magrittr
3 R6
32 shinyjs

Repository ‘back-end’

Infrastructure setup

  • Hosts risk assessment metadata
  • Links to artifacts of the R-hub check system (via DownloadURL)
  • Integrates with pak::pkg_install
  • Supports multiple levels of risk tolerance

DCF file forked from r-hub/repos

Package: bslib
Version: 0.6.1
Depends: R (>= 2.10), R (>= 4.4), R (< 4.4.99)
License: MIT + file LICENSE
DownloadURL:
         https://github.com/cran/bslib/releases/download/0.6.1/bslib_0.6.1_b4_R4.4_x86_64-pc-linux-gnu-ubuntu-22.04.tar.gz
Built: R 4.4.0; ; 2023-11-29 16:39:06 UTC; unix
RVersion: 4.4
Platform: x86_64-pc-linux-gnu-ubuntu-22.04
Imports: base64enc, cachem, grDevices, htmltools (>= 0.5.7), jquerylib (>= 0.1.3),
         jsonlite, lifecycle, memoise (>= 2.0.1), mime, rlang, sass (>= 0.4.0)
...

Added fields for risk-based assessment

riskmetric_run_date: 2023-06-21
riskmetric_version: 0.2.1
covr_coverage: 0.852
has_vignettes: 1
remote_checks: 0.846
...

Packages cohort validation workflow

Risk assessment pipeline

Calculates package QA metadata on updated packages and their reverse dependencies

Produces logs and other reproducibility data

In the future: can run on in-house infrastructure

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: 0.967 has_vignettes: 1 pkg_2 pkg_2 Version: 3.5 covr_coverage: 0.984 has_vignettes: 1 pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: ...       has_vignettes: ...       pkg_2 pkg_2 Version: 3.6 covr_coverage: ...       has_vignettes: ...       pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: 0.967 has_vignettes: 1 pkg_2 pkg_2 Version: 3.6 covr_coverage: 0.987 has_vignettes: 1 pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Our roadmap

What’s next

Automating up-to-date quality metrics to support sponsor risk assessment

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Periodically re-calculate metrics for updated packages
pharmaR/repos
PACKAGES
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")
risk_report("praise")
Reference Image

PDF

Reference container image(s)

Should mimic environments of companies and health authority reviewers

To be used by the Regulatory R Repository for packages cohort validation

Main intent: start a cross-company dialogue on infrastructure

Closing

Impact

Community Grants & Sponsorships

Over USD $1.4 Million

Organizing Large Scale Collaborative Projects

R Validation Hub, R-Ladies

Co-Host Multidisciplinary Data Science Forums

Stanford Data Institute

Direct Support for Key R Events

R/Medicine, R/Pharma, useR!, LatinR, and more

Direct Worldwide Support for R User Groups

Join us

r-consortium.org

  • Help guide the future direction of the R language
  • Collaborate on cross industry initiatives
  • Raise your leadership profile in the R Community
  • Protect your investment in R while supporting the common good

Thank you

To our Core Team members

  • Coline Zeballos, Roche
  • Doug Kelkhoff, Roche
  • Jaime Pires, Roche
  • Yann Féat, mainanalytics
  • Andrew Borgman, Biogen
  • Astrid Radermacher, Jumping Rivers
  • Colin Gillespie, Jumping Rivers
  • Magnus Mengelbier, Limelogic
  • Nicoles Jones, Denali Therapeutics
  • Ramiro Magno, Pattern Institute
  • Stefan Doering, Boehringer-Ingelheim
  • Kevin Kunzmann, Boehringer-Ingelheim
  • Matthias Trampisch, Boehringer-Ingelheim
  • Wilmar Igl, Icon Plc
  • Lluís Revilla, IrsiCaixa AIDS Research Institute
  • Yoni Sidi, Pinpoint Strategies
  • Zhenglei Gao, Bayer