Driving Consensus Through Repos

R/Pharma Summit @ Posit Conf 2024, August 11th

Slides: 🔗 bit.ly/PositConf24


On behalf of the R Validation Hub team:

Aaron Clark Arcus Biosciences

Doug Kelkhoff Roche

👋 Who We Are

The R Validation Hub is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting (pharmaR.org)

  • Grew out of R/Pharma 2018
  • Led by participants from ~10 organizations
  • With frequent involvement from health authorities (primarily the FDA)
  • And subscribers from ~60 organizations spanning multiple industries

🤝 Affiliates:

Works with and provides support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software


Key Pharma Activities

  • The R Validation Hub
  • R Submission Working Group
  • R/Medicine
  • R-Hub
  • R Repositories Working Group (ie CRAN enhancements, future development)

Impact

Community Grants & Sponsorships

Over USD $1.4 Million

Organizing Large Scale Collaborative Projects

R Validation Hub, R-Ladies

Co-Host Multidisciplinary Data Science Forums

Stanford Data Institute

Direct Support for Key R Events

R/Medicine, R/Pharma, useR!, LatinR, and more

Direct Worldwide Support for R User Groups

Join the R Consortium

r-consortium.org

  • Help guide the future direction of the R language
  • Collaborate on cross industry initiatives
  • Raise your leadership profile in the R Community
  • Protect your investment in R while supporting the common good

👷‍♂️ The R Validation Hub: What We Do

Products

White Paper

Guidance on compliant use of R and management of packages

Repositories

Building a public, validation-ready resource for R packages

Coline Zeballos

Communications

Connecting validation experts across the industry

Jaxon Abercrombie, Anuja Das, Antal Martinecz

{riskmetric}

Gather and report on risk heuristics to support validation decision-making

Eric Milliman

{riskassessment}

A web interface to {riskmetric}, supporting review, annotation and cataloging of decisions

Aaron Clark, Jeff Thompson

{riskscore}

An R data package capturing risk metrics across all of CRAN

Aaron Clark

📊 A Quick Survey

Keep your hand raised if…

  • It’s early morning and you need an excuse to stretch
  • This isn’t your first time hearing about the R Validation Hub
  • Your org leverages the R Validation Hub guidelines (risk-based approach)
  • Your org uses R Validation Hub tools ({riskmetric}, {riskassessment})
  • Your org contributes to the R Validation Hub

🗓️ Agenda

  • Communications Workstream 6min
  • {riskassessment} App Workstream 7min
  • {riskmetric} workstream 7min
    Watch for big changes coming
  • Repositories Workstream 25min
  • Room Discussion 10 - 15min
  • Closing

📜 Workstream Updates

📣 Communications Workstream

  • 📞 Community Meetings
  • 🚧 Website Revamp
  • 📜 Case Studies Refresh
  • 🆕 Gathering GxP Package Lists

📞 Community Meetings


Past meetups 📆

  • Jun 27, 2023 - Learnings & Reflections from Case Studies
  • Aug 09, 2023 - {riskmetric} & the {riskassessment} app – A 2-part Mini Series
  • Nov 28, 2023 - Wrapping Up 2023 and Welcoming 2024
  • Feb 03, 2024 - Unraveling the Term “Validation”
  • May 21, 2024 - Tackling Hurdles: Embracing Open-Source Packages in Projects

📞 Next Community Meeting


🗓️ Tues, Aug 20, 2024

👩‍🦰 Bríd Roberts

⌨️ Novartis

Analyzing change in assessed risk across package releases

The Software Open Source (SOS) team manages and executes the risk assessment process for R package validation at Novartis. The team uses an internally developed R package to classify the risk of each package as “low”, “medium”, or “high”.

We analysed the risk assessment data over two time points to determine the impact on the assigned risk categorisation for packages with AND without version changes.

In this talk, we showcase the risk assessments over time, the causes of any risk class changes, and their impact on various teams within our organizations as a result.

📞 Community Meetings


How do I sign up?

🚧 Website: pharmaR.org THEN

🚧 Website: pharmaR.org NOW

📜 Case Studies Refresh

🆕 Gathering GxP Package Lists


  • What?
    • Pkg name, version, assessment date, risk decision
  • Why?
    • R Val Hub will analyze & report observed trends & consensus in aggregate
    • Regulatory Repo WG to help identify which thresholds for certain quality benchmarks
  • How?
    • Will publish a form on pharmaR.org to Sign Up
    • Can be 100% anonymous, or consider open-source
    • 6 pharma Orgs verbally committed, 2 delivered already

🆕 Open-sourced Package List

🔗 insightsengineering.github.io/rvalidationhub-packages/

{riskassessment} App

  • ⏰ Latest features
  • 🛣️ Where we’re headed next
  • 🆕 New Collaborative Deployment

{riskassessment} App

Latest Features Recap

  • Decision automation by {riskmetric} assessment values
  • New Function Explorer for (1) source code, (2) help docs, and (3) tests
  • Tons more…
    • Non-{shinymanager} deployment options
    • Expanded Dependency Support
    • Added an 'About' tab

📣 The feedback loop is crucial!

All of these improvements started off as community-driven suggestions on our GitHub repo. If you have an idea that doesn’t already exist on the existing list of issues, submit a new issue today.

Decision Automation Rules, by assessment

Decision Automation Rules, by assessment

Decision Automation Rules, by assessment

New Function Explorer! (Code by GSK)





{riskassessment} Roadmap 🛣


Going scoreless

  • 💯 Optional, defined in Config file
  • Some orgs adopt a hands-on approach to package review, and the score can interfere with that
  • User survey data suggests scores are rarely used for decision making

New Collaborative Deployment

🔗 app.pharmar.org/riskassessment/

Deployment engineered by:

Roles:

  • 👀 Viewer
  • 👷 Reviewer [Add Pkgs & Comment]
  • 👩‍💼 Lead [Add Pkgs, Comment, Edit Pkg Summary, Make Decisions]
    • Decisions: Undecided, Low, Medium, High Risk
    • Accepting volunteers! Open an issue

{riskmetric}

  • 🔙 Recap last year’s priorities
  • 🥅 New Roadmap

{riskmetric} Recap 🔙


  • Ease of use:
    Wrapper functions for a a complete workflow, prettier outputs
  • Metric completeness:
    Implement metrics for as many package sources as possible. Chain sources together to create more complete assessments
  • Modular additions:
    Allow users to add custom & optional assessments based on community packages (e.g. oyster, srr, pkgstats, etc)
  • Focusing on metrics and scoring:
    Making custom weighting more robust and convenient. Produce guidance materials for weighting specific assessments based on community feedback and publish our own views on best practices.

{riskmetric} Progress 🔄


  • Ease of use: Wrapper functions for a a complete workflow, prettier outputs
    • Near completion!
  • Metric completeness: Implement metrics for as many package sources as possible. Chain sources together to create more complete assessments
    • Made some progress
  • Modular additions: Allow users to easy add custom assessments, create optional assessments based on community packages (e.g. oyster, srr, pkgstats, etc)
    • On the backlog
  • Focusing on metrics and scoring: Making custom weighting more robust and convenient. Guidance materials on weighting specific assessments based on community feedback and our own views on best practices.
    • Metrics, Yes! Scoring, No. In fact…

“Risk Tools” Today

“Risk Tools” Today

“Risk Tools” Roadmap

📦 Repositories

Repositories Workstream

Supporting a transparent, open, dynamic, cross-industry approach of establishing and maintaining a repository of R packages.

  • A CRAN-like repository
  • Providing quantifiable package qualities for risk-based decision-making
  • Evaluated against representative systems
  • “Bring-your-own” quality cut-offs
  • Declarative quality decision-making

The Pulse of the Industry

  • Our whitepaper is widely adopted
  • But implementing it is inconsistent & laborious
    • Variations throughout industry pose uncertainty
    • Sharing software with health authorities is a challenge
    • Health authorities, overwhelmed by technical inconsistencies, are more likely to question software use
  • We feel the most productive path forward is a shared ecosystem
  • Public discussion on how to characterize quality code/methods

Goals

Generating Quality Indicators

  • Provide a community-maintained catalog of package quality indicators (“risk metrics”)
  • Calculated against cohort of packages
  • Known system
  • Consistently evaluated, with transparent methods

Consolidate Decision-Making

  • Serve subsets of packages that conform to a specified risk tolerance
  • Transparently demonstrate selection criteria
  • Allows for one-off-analysis from public repo
  • .. or mirroring of filtered snapshot

An evolving R ecosystem

In close communication with many beloved R projects

Submissions Working Group

Repositories Working Group

pharmaverse

targetting repos integration

r-lib/pak

targetting pak integration

Pilot Implementation

focus on proving capabilities, quick development

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Pre-calculated {riskmetric} scores
{riskscore}
PACKAGES
Manually Join Data
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")

all modelled after r-hub/repos

Interacting with the repo

Packages risk filters

  • Helper package for system administrators
  • Restricts packages available for installation to those fitting a policy
  • Uses packages metadata in the repo
  • May be used together with manual checks (e.g., read a statistical review)

CRAN
CRAN
risk_filter()
risk_filter()
20K+ pkgs
20K+ pkgs
study-ready pkgs
study-ready pkgs
Text is not SVG - cannot display

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Define your quality expectations in code

options(available_packages_filters = risk_filter(
  # no known security vulnerabilities
  quality_cve_count == 0 & (

    # package is exceptionally testing 
    (quality_code_coverage >= 0.8 & 
      quality_example_coverage >= 0.8 &
      quality_r_cmd_check_errors == 0) |

    # or is exceptionally well adopted
    dplyr::percent_rank(quality_downloads_1yr) > 0.9 |
    quality_reverse_dependencies_count >= 10 |

    # or seems to follow thorough development practices
    (quality_has_website &
      quality_vignette_count >= 1 &
      quality_author_count >= 3)
  )
))

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Make exceptions explicit

options(available_packages_filters = risk_filter(
  exceptions = c("riskmetric", "riskscore"),

  # no known security vulnerabilities
  quality_cve_count == 0 & (

    # package is exceptionally testing 
    (quality_code_coverage >= 0.8 & 
      quality_example_coverage >= 0.8 &
      quality_r_cmd_check_errors == 0) |

    # or is exceptionally well adopted
    dplyr::percent_rank(quality_downloads_1yr) > 0.9 |
    quality_reverse_dependencies_count >= 10 |

    # or seems to follow thorough development practices
    (quality_has_website &
      quality_vignette_count >= 1 &
      quality_author_count >= 3)
  )
))

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Tools for inspecting packages that do not adhere to pre-specified criteria

risk_explain(package = "options")
#> <criteria>
#> * `quality_code_coverage` of `0.43` (< 0.8)
#> * `dplyr::precent_risk(quality_downloads_1yr)` of `0.2` (< 0.9)
#> * `quality_reverse_dependencies_count` of `0` (< 10)
#> * `quality_author_count` of `1` (< 3)

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As an Administrator

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Applying filters to a repository mirror

options(available_packages_filters = risk_filter(
  # define your criterion in code
))

packages <- available.packages()
write.dcf(packages, "FILTERED_PACKAGES")
# sync repo using PACKAGES file

How it Works

Shamelessly Piggybacking on r-hub/repos

  • github.com/cran mirrors CRAN packages, hosting source code & binary builds
  • r-hub/repos indexes packages mirrored on github.com/cran
    • Adds a custom DownloadUrl field, used by {pak} to fetch packages
    • Serves PACKAGES file, a standard format for indexing packages
  • pharmaR/repos periodically looks for changes in r-hub/repos
    • Calculates metrics for updated packages
    • And hosts PACKAGES file with additional quality fields

r-hub/repos PACKAGES file

Package: bslib
Version: 0.6.1
Depends: R (>= 2.10), R (>= 4.4), R (< 4.4.99)
License: MIT + file LICENSE
DownloadURL:
  https://github.com/cran/bslib/../bslib-ubuntu-22.04.tar.gz
...

Added fields for risk-based assessment

riskmetric_run_date: 2023-06-21
riskmetric_version: 0.2.1
quality_code_coverage: 0.852
quality_vignettes_count: 1
quality_r_cmd_check_errors: 0
...

Package Validation Workflow

Risk assessment pipeline

Calculates package quality metadata for updated packages and reverse dependencies

Produces logs and necessary reporting metadata

Support in-house packages with open-source tooling

Our roadmap

What’s next

Automating up-to-date quality metrics to support sponsor risk assessment

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Periodically re-calculate metrics for updated packages
pharmaR/repos
PACKAGES
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")
risk_report("praise")
Reference Image

PDF

What’s next

Technical Development

🧪

Pipeline Development

in progress

metric re-calculation

📜

Artifact Discoverability

Accessible logs, build artifacts

🥳

Friendly User Interfaces

Comfy tools for working with the repo

What’s next

Governance

🚢

Pipeline Development

in progress

Base image development/exploration

🏅

Filter Recommendations

Start dialogue about industry-standard thresholds

♻️

Lifecycle Decisions

  • how frequently should we sync with r-hub/repos/CRAN?
  • which packages require re-evaluation?

Closing

The Year Ahead

Upcoming Deliverables

  • GxP package list analyses
  • {riskmetric} refocused on qualities, not scores
  • A scoreless {riskassessment} application
  • Enterprise package cohorts
  • Repositories pipelines

Key Opportunties

  • 📈 {riskmetric} and/or {riskassessment} project manager
    milestone planning, sprint organizing, issue triage, stakeholder management
  • 📦 R package Development Opportunities
    {riskmetric}/{riskassessment}/{pharmapkgs}
  • 🚀 Infrastructure Development Opportunities
    GitHub workflows for metrics calculations, repository management

Thank you

thank you to our many contributors

our work is the product of the donated time of many passionate individuals

Slides: 🔗 bit.ly/PositConf24