Driving Consensus Through Repos

R/Pharma Summit @ Posit Conf 2024, August 11th

On behalf of the R Validation Hub team:

Doug Kelkhoff Roche

👋 Who We Are

The R Validation Hub is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting (pharmaR.org)

Grew out of R/Pharma 2018
Led by participants from ~10 organizations
With frequent involvement from health authorities (primarily the FDA)
And subscribers from ~60 organizations spanning multiple industries

🤝 Affiliates: R Consortium

Works with and provides support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software

Key Pharma Activities

The R Validation Hub
R Submission Working Group
R/Medicine
R-Hub
R Repositories Working Group (ie CRAN enhancements, future development)

R Consortium Impact

Community Grants & Sponsorships

Over USD $1.4 Million

Organizing Large Scale Collaborative Projects

R Validation Hub, R-Ladies

Co-Host Multidisciplinary Data Science Forums

Stanford Data Institute

Direct Support for Key R Events

R/Medicine, R/Pharma, useR!, LatinR, and more

Direct Worldwide Support for R User Groups

Join the R Consortium

r-consortium.org

Help guide the future direction of the R language
Collaborate on cross industry initiatives
Raise your leadership profile in the R Community
Protect your investment in R while supporting the common good

👷‍♂️ The R Validation Hub: What We Do

Visit pharmaR.org to get involved!

Products

White Paper

Guidance on compliant use of R and management of packages

Repositories

Building a public, validation-ready resource for R packages

Coline Zeballos

Communications

Connecting validation experts across the industry

Jaxon Abercrombie, Anuja Das, Antal Martinecz

`{riskmetric}`

Gather and report on risk heuristics to support validation decision-making

Eric Milliman

`{riskassessment}`

A web interface to {riskmetric}, supporting review, annotation and cataloging of decisions

Aaron Clark, Jeff Thompson

`{riskscore}`

An R data package capturing risk metrics across all of CRAN

Aaron Clark

📊 A Quick Survey

Keep your hand raised if…

It’s early morning and you need an excuse to stretch
This isn’t your first time hearing about the R Validation Hub
Your org leverages the R Validation Hub guidelines (risk-based approach)
Your org uses R Validation Hub tools ({riskmetric}, {riskassessment})
Your org contributes to the R Validation Hub

🗓️ Agenda

Communications Workstream 6min
{riskassessment} App Workstream 7min
{riskmetric} workstream 7min
Watch for big changes coming
Repositories Workstream 25min
Room Discussion 10 - 15min
Closing

📜 Workstream Updates

📣 Communications Workstream

📞 Community Meetings
🚧 Website Revamp
📜 Case Studies Refresh
🆕 Gathering GxP Package Lists

📞 Community Meetings

Past meetups 📆

Jun 27, 2023 - Learnings & Reflections from Case Studies
Aug 09, 2023 - {riskmetric} & the {riskassessment} app – A 2-part Mini Series
Nov 28, 2023 - Wrapping Up 2023 and Welcoming 2024
Feb 03, 2024 - Unraveling the Term “Validation”
May 21, 2024 - Tackling Hurdles: Embracing Open-Source Packages in Projects

📞 Next Community Meeting

🗓️ Tues, Aug 20, 2024

👩‍🦰 Bríd Roberts

⌨️ Novartis

Analyzing change in assessed risk across package releases

The Software Open Source (SOS) team manages and executes the risk assessment process for R package validation at Novartis. The team uses an internally developed R package to classify the risk of each package as “low”, “medium”, or “high”.

We analysed the risk assessment data over two time points to determine the impact on the assigned risk categorisation for packages with AND without version changes.

In this talk, we showcase the risk assessments over time, the causes of any risk class changes, and their impact on various teams within our organizations as a result.

📞 Community Meetings

Follow the R Consortium’s Linkedin
✉ Join our mailing list! (pharmaR.org > Contact Us)

🚧 Website: `pharmaR.org` THEN

🚧 Website: `pharmaR.org` NOW

📜 Case Studies Refresh

🆕 Gathering GxP Package Lists

What?
- Pkg name, version, assessment date, risk decision

Why?
- R Val Hub will analyze & report observed trends & consensus in aggregate
- Regulatory Repo WG to help identify which thresholds for certain quality benchmarks

How?
- Will publish a form on pharmaR.org to Sign Up
- Can be 100% anonymous, or consider open-source
- 6 pharma Orgs verbally committed, 2 delivered already

🆕 Open-sourced Package List

🔗 insightsengineering.github.io/rvalidationhub-packages/

`{riskassessment}` App

⏰ Latest features
🛣️ Where we’re headed next
🆕 New Collaborative Deployment

`{riskassessment}` App

Latest Features Recap

Decision automation by {riskmetric} assessment values
New Function Explorer for (1) source code, (2) help docs, and (3) tests
Tons more…
- Non-{shinymanager} deployment options
- Expanded Dependency Support
- Added an 'About' tab

New Post!

📣 The feedback loop is crucial!

All of these improvements started off as community-driven suggestions on our GitHub repo. If you have an idea that doesn’t already exist on the existing list of issues, submit a new issue today.

Decision Automation Rules, by assessment

New Function Explorer! (Code by GSK)

`{riskassessment}` Roadmap 🛣

Going scoreless

💯 Optional, defined in Config file
Some orgs adopt a hands-on approach to package review, and the score can interfere with that
User survey data suggests scores are rarely used for decision making

New Collaborative Deployment

🔗 app.pharmar.org/riskassessment/

Deployment engineered by:

Roles:

👀 Viewer
👷 Reviewer [Add Pkgs & Comment]
👩‍💼 Lead [Add Pkgs, Comment, Edit Pkg Summary, Make Decisions]
- Decisions: Undecided, Low, Medium, High Risk
- Accepting volunteers! Open an issue

`{riskmetric}`

🔙 Recap last year’s priorities
🥅 New Roadmap

`{riskmetric}` Recap 🔙

Ease of use:
Wrapper functions for a a complete workflow, prettier outputs

Metric completeness:
Implement metrics for as many package sources as possible. Chain sources together to create more complete assessments

Modular additions:
Allow users to add custom & optional assessments based on community packages (e.g. oyster, srr, pkgstats, etc)

Focusing on metrics and scoring:
Making custom weighting more robust and convenient. Produce guidance materials for weighting specific assessments based on community feedback and publish our own views on best practices.

`{riskmetric}` Progress 🔄

Ease of use: Wrapper functions for a a complete workflow, prettier outputs
- Near completion!
Metric completeness: Implement metrics for as many package sources as possible. Chain sources together to create more complete assessments
- Made some progress
Modular additions: Allow users to easy add custom assessments, create optional assessments based on community packages (e.g. oyster, srr, pkgstats, etc)
- On the backlog
Focusing on metrics ~~and scoring~~: Making custom weighting more robust and convenient. Guidance materials on weighting specific assessments based on community feedback and our own views on best practices.
- Metrics, Yes! Scoring, No. In fact…

“Risk Tools” Today

“Risk Tools” Roadmap

📦 Repositories

Repositories Workstream

Supporting a transparent, open, dynamic, cross-industry approach of establishing and maintaining a repository of R packages.

A CRAN-like repository
Providing quantifiable package qualities for risk-based decision-making
Evaluated against representative systems
“Bring-your-own” quality cut-offs
Declarative quality decision-making

The Pulse of the Industry

Our whitepaper is widely adopted
But implementing it is inconsistent & laborious
- Variations throughout industry pose uncertainty
- Sharing software with health authorities is a challenge
- Health authorities, overwhelmed by technical inconsistencies, are more likely to question software use
We feel the most productive path forward is a shared ecosystem
Public discussion on how to characterize quality code/methods

Goals

Generating Quality Indicators

Provide a community-maintained catalog of package quality indicators (“risk metrics”)
Calculated against cohort of packages
Known system
Consistently evaluated, with transparent methods

Consolidate Decision-Making

Serve subsets of packages that conform to a specified risk tolerance
Transparently demonstrate selection criteria
Allows for one-off-analysis from public repo
.. or mirroring of filtered snapshot

An evolving R ecosystem

In close communication with many beloved R projects

Submissions Working Group

Repositories Working Group

`pharmaverse`

targetting repos integration

`r-lib/pak`

targetting pak integration

Pilot Implementation

focus on proving capabilities, quick development

✨ all modelled after r-hub/repos

Interacting with the repo

Packages risk filters

Helper package for system administrators
Restricts packages available for installation to those fitting a policy
Uses packages metadata in the repo
May be used together with manual checks (e.g., read a statistical review)

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Define your quality expectations in code

options(available_packages_filters = risk_filter(
  # no known security vulnerabilities
  quality_cve_count == 0 & (

    # package is exceptionally testing 
    (quality_code_coverage >= 0.8 & 
      quality_example_coverage >= 0.8 &
      quality_r_cmd_check_errors == 0) |

    # or is exceptionally well adopted
    dplyr::percent_rank(quality_downloads_1yr) > 0.9 |
    quality_reverse_dependencies_count >= 10 |

    # or seems to follow thorough development practices
    (quality_has_website &
      quality_vignette_count >= 1 &
      quality_author_count >= 3)
  )
))

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Make exceptions explicit

options(available_packages_filters = risk_filter(
  exceptions = c("riskmetric", "riskscore"),

  # no known security vulnerabilities
  quality_cve_count == 0 & (

    # package is exceptionally testing 
    (quality_code_coverage >= 0.8 & 
      quality_example_coverage >= 0.8 &
      quality_r_cmd_check_errors == 0) |

    # or is exceptionally well adopted
    dplyr::percent_rank(quality_downloads_1yr) > 0.9 |
    quality_reverse_dependencies_count >= 10 |

    # or seems to follow thorough development practices
    (quality_has_website &
      quality_vignette_count >= 1 &
      quality_author_count >= 3)
  )
))

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Tools for inspecting packages that do not adhere to pre-specified criteria

risk_explain(package = "options")
#> <criteria>
#> * `quality_code_coverage` of `0.43` (< 0.8)
#> * `dplyr::precent_risk(quality_downloads_1yr)` of `0.2` (< 0.9)
#> * `quality_reverse_dependencies_count` of `0` (< 10)
#> * `quality_author_count` of `1` (< 3)

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

As an Administrator

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))

Applying filters to a repository mirror

options(available_packages_filters = risk_filter(
  # define your criterion in code
))

packages <- available.packages()
write.dcf(packages, "FILTERED_PACKAGES")
# sync repo using PACKAGES file

How it Works

Shamelessly Piggybacking on `r-hub/repos`

github.com/cran mirrors CRAN packages, hosting source code & binary builds
r-hub/repos indexes packages mirrored on github.com/cran
- Adds a custom DownloadUrl field, used by {pak} to fetch packages
- Serves PACKAGES file, a standard format for indexing packages
pharmaR/repos periodically looks for changes in r-hub/repos
- Calculates metrics for updated packages
- And hosts PACKAGES file with additional quality fields

r-hub/repos PACKAGES file

Package: bslib
Version: 0.6.1
Depends: R (>= 2.10), R (>= 4.4), R (< 4.4.99)
License: MIT + file LICENSE
DownloadURL:
  https://github.com/cran/bslib/../bslib-ubuntu-22.04.tar.gz
...

Added fields for risk-based assessment

riskmetric_run_date: 2023-06-21
riskmetric_version: 0.2.1
quality_code_coverage: 0.852
quality_vignettes_count: 1
quality_r_cmd_check_errors: 0
...

Package Validation Workflow

Risk assessment pipeline

Calculates package quality metadata for updated packages and reverse dependencies

Produces logs and necessary reporting metadata

Support in-house packages with open-source tooling

Our roadmap

What’s next

Automating up-to-date quality metrics to support sponsor risk assessment

What’s next

Technical Development

🧪

Pipeline Development

in progress

metric re-calculation

📜

Artifact Discoverability

Accessible logs, build artifacts

🥳

Friendly User Interfaces

Comfy tools for working with the repo

What’s next

Governance

🚢

Pipeline Development

in progress

Base image development/exploration

🏅

Filter Recommendations

Start dialogue about industry-standard thresholds

♻️

Lifecycle Decisions

how frequently should we sync with r-hub/repos/CRAN?
which packages require re-evaluation?

Closing

The Year Ahead

Upcoming Deliverables

GxP package list analyses
{riskmetric} refocused on qualities, not scores
A scoreless {riskassessment} application
Enterprise package cohorts
Repositories pipelines

Key Opportunties

📈 {riskmetric} and/or {riskassessment} project manager
milestone planning, sprint organizing, issue triage, stakeholder management
📦 R package Development Opportunities
{riskmetric}/{riskassessment}/{pharmapkgs}
🚀 Infrastructure Development Opportunities
GitHub workflows for metrics calculations, repository management

Thank you

thank you to our many contributors

our work is the product of the donated time of many passionate individuals

Slides: 🔗 bit.ly/PositConf24

Driving Consensus Through Repos

👋 Who We Are

🤝 Affiliates: R Consortium

Key Pharma Activities

R Consortium Impact

Community Grants & Sponsorships

Organizing Large Scale Collaborative Projects

Co-Host Multidisciplinary Data Science Forums

Direct Support for Key R Events

Direct Worldwide Support for R User Groups

Join the R Consortium

r-consortium.org

👷‍♂️ The R Validation Hub: What We Do

Products

White Paper

Repositories

Communications

{riskmetric}

{riskassessment}

{riskscore}

📊 A Quick Survey

🗓️ Agenda

📜 Workstream Updates

📣 Communications Workstream

📞 Community Meetings

Past meetups 📆

📞 Next Community Meeting

Analyzing change in assessed risk across package releases

📞 Community Meetings

How do I sign up?

🚧 Website: pharmaR.org THEN

🚧 Website: pharmaR.org NOW

📜 Case Studies Refresh

🆕 Gathering GxP Package Lists

🆕 Open-sourced Package List

🔗 insightsengineering.github.io/rvalidationhub-packages/

{riskassessment} App

{riskassessment} App

Latest Features Recap

New Post!

📣 The feedback loop is crucial!

Decision Automation Rules, by assessment

Decision Automation Rules, by assessment

Decision Automation Rules, by assessment

New Function Explorer! (Code by GSK)

{riskassessment} Roadmap 🛣

Going scoreless

New Collaborative Deployment

🔗 app.pharmar.org/riskassessment/

Roles:

{riskmetric}

{riskmetric} Recap 🔙

{riskmetric} Progress 🔄

“Risk Tools” Today

“Risk Tools” Today

“Risk Tools” Roadmap

📦 Repositories

Repositories Workstream

The Pulse of the Industry

Goals

Generating Quality Indicators

Consolidate Decision-Making

An evolving R ecosystem

Submissions Working Group

Repositories Working Group

pharmaverse

r-lib/pak

Pilot Implementation

Interacting with the repo

Packages risk filters

As a user*

As a user*

As a user*

As an Administrator

How it Works

Shamelessly Piggybacking on r-hub/repos

Package Validation Workflow

Risk assessment pipeline

Our roadmap

What’s next

`{riskmetric}`

`{riskassessment}`

`{riskscore}`

🚧 Website: `pharmaR.org` THEN

🚧 Website: `pharmaR.org` NOW

`{riskassessment}` App

`{riskassessment}` App

`{riskassessment}` Roadmap 🛣

`{riskmetric}`

`{riskmetric}` Recap 🔙

`{riskmetric}` Progress 🔄

`pharmaverse`

`r-lib/pak`

Shamelessly Piggybacking on `r-hub/repos`