layout: false class: title-slide, middle, center .pull-farleft[ .font180[A Risk-based Approach to R Validation] R/Pharma Workshop, 2020 <br> <div class="row"> <div class="column"> <img class="profile" src="images/andy-nicholls.jpg" alt="Andy Nicholls"> <div class="profile-name">Andy Nicholls</div> </div> <div class="column"> <img class="profile" src="images/marly-gotti.jpg" alt="Marly Gotti"> <div class="profile-name">Marly Gotti</div> </div> </div> ] --- layout: false class: title-slide, middle, center .pull-farleft[ .font180[Introduction] R/Pharma Conference, 2020 <br> Andy Nicholls ] .pull-farright[ ] <!----------------------------------------------------------------------------> --- # Workshop Expecations - In this workshop we will walk through a practical implementation of the R Validation Hub's white paper: [A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure](https://www.pharmar.org/white-paper/). - **Expect:** - Definitions - References - Opinions <-- *Not representing our parent companies* - Discussion! - **Do not expect:** - Answers - To "validate R" today! --- # Outline - Introduction [15 mins] -- *Andy* - Overview of the white paper [20 mins] -- *Andy* - BREAK [10 mins] - Risk Assessment Tools [60 mins, inc 30 min exercise] -- *Marly* - BREAK [10 mins] - Responding to Risk [60 mins, inc 30 min exercise] -- *Andy* - Close [5 mins] --- # Mission <img src="images/pharmaRlogo_large.png" alt="R Validation Hub"> > The R Validation Hub is a cross-industry initiative whose mission is to enable the use of R by the Bio-Pharmaceutical Industry in a regulatory setting, where the output may be used in submissions to regulatory agencies. --- # Who are we? - Formed in 2018 by members of PSI’s AIMS SIG - Now an [R Consortium Working Group](https://www.r-consortium.org/projects/isc-working-groups) - Executive Committee - **Andy Nicholls (GSK)** - **Marly Gotti (Biogen)** - Lyn Taylor (Phastar) - Joe Rickert (RStudio / R Consortium) - Juliane Manitz (Merck KGaA / EMD Serono) - Yilong Zhang (MSD) - Doug Kelkhoff (Genentech) - Keaven Anderson (MSD) * ~100 members from multiple organisations across the pharmaceutical sector <img src="images/psi.png" alt="R Validation Hub" height=80 style="margin:5px 50px"> <img src="images/efspi.png" alt="R Validation Hub" height=80 style="margin:5px 50px"> <img src="images/rconsort.png" alt="R Validation Hub" height=80 style="margin:5px 50px"> --- # Resources - Keep up to date at https://www.pharmar.org/ - Blog posts - Presentations - White paper - Tools available on [GitHub](https://github.com/pharmaR) - The riskmetric R Package - Risk Assessment App [coming soon] - To contribute, send a message to [psi.aims.r.validation@gmail.com](mailto:psi.aims.r.validation@gmail.com) - Else, join the [mailing list](https://lists.r-consortium.org/g/RConsortium-Validation-Hub) today! <!----------------------------------------------------------------------------> --- layout: false class: inverse, middle, center # Poll <!----------------------------------------------------------------------------> --- # Which of the following statements applies to you and/or your company? - We have used R for FDA submissions. - In theory we are allowed to use R for FDA submissions, but we haven't used it yet. - We are currently building a validated R environment. - We haven't started using R in my company. - Oops! I have attended the wrong workshop. *You have 30 seconds...* --- # Regulations FDA: “…statistical software is not explicitly discussed in [21 CFR Part 11]” ICH: “… should be reliable, and documentation of appropriate software testing procedures should be available” <a href = "https://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM587506.pdf"><img src="images/fda_clarifying.png" alt="source: FDA Statistical Software Clarifying Statement" height=400 ></a> <!-- --- --> <!-- # ICH E9 --> <!-- > **Integrity of Data and Computer Software Validity** --> <!-- > The credibility of the numerical results of the analysis depends on the quality and validity of the methods and software (both internally and externally written) used both for data management (data entry, storage, verification, correction and retrieval) and also for processing the data statistically. Data management activities should therefore be based on thorough and effective standard operating procedures. The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available. --> --- # Further Reading The following have played an important role in the formation of the white paper and this workshop - **ICH** - [E9](https://www.ema.europa.eu/en/ich-e9-statistical-principles-clinical-trials#current-version-section) - **FDA** - [FDA Statistical Software Clarifying Statement](https://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM587506.pdf) - [21 CFR Part 11](https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=11&showFR=1) - [Guidance for Industry Part 11, Electronic Records; Electronic Signatures — Scope and Application](https://www.fda.gov/media/75414/download) - [Glossary of Computer System Software Development Terminology](https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/inspection-guides/glossary-computer-system-software-development-terminology-895) - [General Principles of Software Validation; Final Guidance for Industry and FDA Staff](https://www.fda.gov/media/73141/download) - **EMA** - [Notice to sponsors on validation and qualification of computerised systems used in clinical trials](https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/notice-sponsors-validation-qualification-computerised-systems-used-clinical-trials_en.pdf) - [Q&A: Good clinical practice (GCP)](https://www.ema.europa.eu/en/human-regulatory/research-development/compliance/good-clinical-practice/qa-good-clinical-practice-gcp) --- # Important Definitions <blockquote> "The term <b>qualification</b> is used in this notice to describe <b>verification</b> of system functionality. The term <b>validation</b> is used to describe the process of establishing and documenting that the specified requirements of a computerised system can be consistently fulfilled from design until decommissioning of the system or transition to a new system (ICH E6(R2), section 1.65), i.e. it operates to defined specifications and defined procedures (SOPs) by a trained user." .right[-- <cite>[Notice to sponsors on validation and qualification of computerised systems used in clinical trials, EMA, 07-Apr-2020](https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/notice-sponsors-validation-qualification-computerised-systems-used-clinical-trials_en.pdf)</cite>] </blockquote> --- # Important Definitions Some simplified definitions that we'll stick with today: - **Verification.** Generally, for single use pieces of code, by the author and optionally, depending on risk, by a peer reviewer, verifying that the code and output appear correct. - **Qualification.** Documented evidence that a system satisfies its requirements - **Validation.** As for qualification, but the system requirements should include tighter controls in order to ensure regulations are met<sup>^</sup>. In a GxP setting the requirements would typically include tight access control (including documentation of user training), an audit trail and additional security measures to ensure compliance. - The relationship is essentially hierarchical Verification < Qualification < Validation .footnote[ ^: Thus, software cannot come 'pre-validated' ] --- # Examples - We might validate - a controlled workflow environment, e.g. a Statistical Computing Environment (SCE) - a GxP application, eg a Shiny app - We might qualify - a software installation, e.g. SAS/R/python - a 'multiple use' utility, e.g. a SAS macro / R function - We might verify - the results of an analysis by double programming - that if I call a function/macro with specific parameters then I get the expected result (unit testing) - If R (or SAS) is used as part of a validated application (eg a Shiny app) then we qualify the R installation and validate the application - We cannot validate the use of R (or SAS) for 'single-use' scripts that use packages/functions in combination - But we may wish to take steps to reduce end-user risk <!----------------------------------------------------------------------------> --- layout: false class: inverse, middle, center # White Paper: A Risk-based Approach for Assessing R package Accuracy within a Validated Infrastructure **Andy Nicholls**, Statistics Director, Head of Statistical Data Sciences, GSK **Paulo R. Bargo**, Director Scientific Computing, Statistics & Decision Sciences, Janssen R&D **John Sims**, Director, Analytical Systems Architect & Data Science - Pfizer Vaccine Research <!----------------------------------------------------------------------------> <!-- --- --> <!-- # Trust / Reliability --> <!-- How can we establish trust in a resource? --> <!-- - A vendor audit --> <!-- - Prior experience --> <!-- - Expansive testing --> <!-- - ... --> --- # Regulatory perspectives ## FDA <blockquote> "We recommend that you base your approach on a justified and <b>documented risk assessment</b> and a determination of the potential of the system to affect product quality and safety, and record integrity" .right[-- <cite>[Guidance for Industry Part 11, Electronic Records; Electronic Signatures — Scope and Application](https://www.fda.gov/media/75414/download)</cite>] </blockquote> ## EMA <blockquote> "The sponsor may rely on qualification documentation provided by the vendor, if the qualification activities performed by the vendor have been assessed as adequate. However, the sponsor may also have to perform additional qualification (and validation) activities based on a <b>documented risk assessment.</b>" .right[-- <cite>[Notice to sponsors on validation and qualification of computerised systems used in clinical trials, EMA, 07-Apr-2020](https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/notice-sponsors-validation-qualification-computerised-systems-used-clinical-trials_en.pdf)</cite>] </blockquote> - Note: The above documents describe computerized systems and neither addresses address programming languages directly <!-- --- --> <!-- # What is a Risk-based Approach? --> <!-- - All validation is risk-based! --> <!-- - A prioritisation of our testing strategy based on what we already know --> <!-- - How well is the package developed / tested? --> <!-- - How much additional exposure has it had within the user community? --> <!-- - We generally approach commercial software this way already! --> --- # Validation, R and R Packages The FDA provide a clear definition of validation in the [Glossary of Computer System Software Development Terminology](https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/inspection-guides/glossary-computer-system-software-development-terminology-895). This can be broken down into three core components: 1. Accuracy 2. Reproducibility 3. Traceability The focus of the white paper is accuracy. --- background-image: url("images/German-et-al.png") background-size: 25% background-position: 90% 16% # What do we mean by 'R'? - **Core R (Base+Recommended)** - Low risk - See [R: Regulatory Compliance and Validation Issues...](https://www.r-project.org/doc/R-FDA.pdf). - **Contributed** - Variable risk - Many different authors - Varying SDLCs - Varying levels of popularity - Potentially lots of unknowns <!-- <img src="images/German-et-al.png" alt="source: German-et-al" height="120px" > --> We propose a risk-based approach to establish accuracy/validity for contributed packages .footnote[ *Source: German, D.M. & Adams, Bram & Hassan, Ahmed E.. (2013). The Evolution of the R Software Ecosystem. Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR. 243-252. 10.1109/CSMR.2013.33.* ] --- # Classifying R Packages within an Installation - Within an installation, two classifications of R packages are proposed: 1. **Intended for Use**. These will be loaded directly by a user during an R session. 2. **Imports**. These packages are required to be installed in order to use the Intended for Use packages. They are comparable to a system dependency, or the ‘back-end’ code supporting a user interface. - A risk-based approach to validation should focus on the way that components of the system are **intended to be used**. --- # An R Package Risk Assessment Framework We propose to assess the risk of contributed R packages based on four criteria: - Purpose - Maintenance Good Practice (Software Development Life Cycle, SDLC) - Community Usage - Testing (Also part of an SDLC) --- # Pupose - What will the package used for? - Is it... 1. Statistical? 2. Non-statistical? - Hence, how difficult is it for an end user to identify errors? > "Tests should be written for all statistical modelling functions within an Intended for Use Statistical Package, regardless of the risk assessment." - To be challenged later... --- # Maintenance / Testing / Community Usage - Development and **maintenance** best practice are essential for any software - How are bugs managed? - What is the documentation like? - Is the source code on GitHub? - **Testing** is a crucial part of an SDLC and another specific maintenance best practice metric - Increased exposure to users (**community usage**) helps to reduce risk - More on these coming up... --- # Proposed Workflow <img src="images/Assessing-package-accuracy.png" alt="source: Assessing Package Accuracy"> --- layout: false class: inverse, middle, center # BREAK
10
:
00