Network Update for Programmers, Biostatisticians and clinical data enthusiasts #68
Artificial Intelligence
Applications of Large Models in Medicine. This paper explores the impact of large-scale models, particularly Medical Large Models (MedLMs), on healthcare advancements. These models include Large Language Models (LLMs), Vision Models, 3D Large Models, and Multimodal Models, all of which are transforming disease prediction, diagnostics, personalized treatment planning, and drug discovery. The integration of graph neural networks in medical knowledge graphs enhances understanding of complex biomedical relationships, while Large Graph Models (LGMs) are improving drug discovery. Vision-Language Models (VLMs) and 3D models are revolutionizing medical image analysis, anatomical modeling, and prosthetic design. Despite challenges, these technologies are significantly improving diagnostic accuracy and enabling personalized healthcare. The paper provides an overview of the current state and future potential of large models in medicine, highlighting their importance in advancing global health. Link
Gary Monk - Novo Nordisk is using Claude AI to drastically reduce the time it takes to draft clinical study reports, from 15 weeks to just 10 minutes, with only three writers overseeing the process instead of 50+. Although no writers have been laid off, fewer are being hired, allowing the company to reallocate savings to other departments.
Several other pharma companies are also utilizing AI to drive operational efficiencies:
While these advancements are impressive, there’s a reminder that efficiency gains alone are meaningless without true effectiveness. AI use cases in pharma often prioritize efficiency over effectiveness, a balance still worth considering. Link to the original post
Paul Agapow - In a recent post shared a very intersting, 32 pages long presentation on AI and the industry. Some of the takeaways:
AI projects must align with the company’s focus (e.g., drug development, patient screening) and be mindful of how different teams operate. Finally, they suggested that while large-impact AI opportunities exist (e.g., clinical trials), starting with smaller, more contained projects (e.g., regulatory documentation) may be more strategic for initial success. Link to the original post
Rebecca D. Jones-Taha, PhD MBA - Genome modeling and design across all domains of life with Evo 2. Check out yesterday’s announcement where the Arc Institute has developed the largest AI model for biology – which can predict, with 90% accuracy, which genetic changes in human DNA are likely to cause diseases like breast cancer (among other things). Link to the original post
Statistical Programming
Bartosz Jabłoński - During upcoming #PharmaSUG 2025 conference will be presenting on the topic of the use of SAS Packages in (Any)/the Pharma Industry – Opportunities, Possibilities, and Benefits
Sharing complex SAS® code—especially across different systems—can be challenging due to dependencies like macros, formats, and datasets. This article introduces SAS Packages as a solution to simplify code organization, deployment, and sharing. It explains how to create and use these packages to bundle all necessary components, making code distribution more reliable and efficient. The article also highlights the benefits and opportunities SAS Packages offer, particularly within the pharmaceutical industry. Link
Lisa Lyons - Going Under the Hood: How Does the Macro Processor Really Work? Did you ever wonder what really goes on behind the scenes of the macro processor, or how it works with other parts of the SAS® system? With the knowledge of the macro processing sequence of events, programmers can better understand how macro processing works. Since most macros are generic ones, this particular knowledge can save valuable development, debugging, and validation time. This paper will help you to understand the timing of macro processing and its relationship with SAS language processing. Link
Stefan Thoma - Please welcome {autoslider.core} to the pharmaverse!
Creating clinical study slides just got easier!
Our latest blog post introduces {autoslider.core}, an R package designed to automate PowerPoint slide generation for clinical reporting.
🔹 Save Time – No more manual slide creation.
🔹 Reduce Errors – Stop copy-pasting outputs and values.
🔹 Enhance Efficiency – Generate & update slide decks effortlessly.
Originally developed at Roche and now open-sourced in the pharmaverse, {autoslider.core} helps streamline the process of producing study outputs for presentations. Link
José Francisco Román Brito - Posts about good programming practices and software engineering principles. To truly master SQL, developers must focus on:
Mastering SQL involves not just knowing advanced functions, but ensuring the code is efficient, secure, maintainable, and collaborative. Link to the original post
Péter Ferenczy - AI can be a great tool for quick coding help, but it sometimes struggles to adapt when its initial solutions fail. While working in RStudio with Vim keybindings, Peter faced a simple task—incrementing numbers using Vim—but AI repeatedly offered ineffective solutions, even after feedback. The AI failed to reassess its strategy, getting stuck in a loop. A breakthrough came after switching to a different AI and providing the full context of previous failures, leading to a successful solution. Key takeaway: If AI gets stuck, reset the context—restart the chat, reframe the question, or try a different AI to get fresh results. Original post - Link
Athenkosi Nkonyeni - Representation of Numeric Dates in Dataset-JSON
Handling numeric dates values across different systems has long been a challenge due to varying date epochs and floating-point precision issues. These discrepancies can lead to incorrect data interpretation, affecting interoperability in regulatory submissions and other critical data exchanges.
Dataset-JSON v1.1 addresses these challenges by adopting the ISO 8601 standard for numeric dates and introducing the targetDataType metadata attribute. These enhancements eliminate the need for external documentation, prevent precision loss, and ensure seamless data exchange between different programming environments and systems. Link to the original post
R Consortium - R Submissions Working Group: Pilot 5 Launch and more!
The R Consortium Submission Working Group is excited to announce a new Pilot 5 that aims to deliver an R-based Submission to the FDA using Dataset-JSON. This post also includes plans for additional launches for 2025/2026, some news on Pilot 4 (containers and webassembly) and a few other goodies! Link to the original post
Pharmaverse - Working with Clinical Trial Data? There’s a Pharmaverse Package for That
Looking for R packages to manage clinical trial data? Pharmaverse has tools for every stage from data collection to submission! Link to the original post
Biostatistics
Tim Morris - Missing baseline data when analysing change-from-baseline. This post discusses the handling of missing baseline data in randomized controlled trials (RCTs) when analyzing change-from-baseline outcomes. Tim examines the implications of adjusting for baseline data and the use of mean imputation methods for missing baseline values.
Sofia S. Villar - Exact statistical tests using integer programming: Leveraging an overlooked approach for maximizing power for differences between binomial proportions. Traditional methods for testing differences in binomial proportions—like the Wald and Fisher’s exact tests—often struggle with type I error control, especially in small samples. Regulators prefer conservative methods, but this can compromise statistical power. This work extends a 1969 approach to develop a new family of exact tests that maximize power while strictly controlling type I error. The method uses integer programming to define optimal rejection regions, offering both theoretical guarantees and empirical improvements over standard tests. It is especially robust when optimized for average power across alternatives and can be customized using weighted priors. The study also demonstrates the value of applying combinatorial optimization in statistical methodology. Link
Thomas Debray, PhD - A new paper led by Orestis Efthimiou Measuring the Performance of Prediction Models to Personalize Treatment Choice, introduces a framework for evaluating models that estimate individualized treatment effects. The framework focuses on three key dimensions:
The framework is demonstrated using simulations and a real-world depression trial, with all methods available in the R package predieval (on CRAN). These concepts will also feature in an upcoming book by the authors. Link
Federico R. on a recently published paper - Success and Futility Criteria for Accelerated Approval of Oncology Drugs. When a single pivotal trial is used to test an early endpoint for Accelerated Approval (AA) and a late endpoint for Regular Approval (RA), the overall type-1 error rate is typically controlled by splitting the overall alpha level between the two tests, using some multiple testing procedure.The drawback of this approach is that the late endpoint for RA is tested at a more stringent level, unless the early endpoint for AA shows statistically significant results.In this recent paper, Dong Xi and Jiangtao Gou propose a smart way to test the late endpoint for RA at the full alpha level, by using on a futility boundary for the early endpoint for AA. Despite this additional futility rule somehow reduces the power of the test of the late endpoint for RA, in some sense it just formally accounts for the likely decision to stop the trial in the case the early endpoint shows very deceiving results. Conversely, classical approaches based on alpha level splitting ignore the fact that the tests occur at different times, and the second test is unlikely to be conducted if the first shows very poor results. Link
Cesar T. - A Tipping Point Method to Evaluate Sensitivity to Potential Violations in Missing Data Assumptions. Some current/former colleagues and I had our manuscript published, regarding our tipping point analysis method for the evaluation of missing data assumptions. The approach is novel in that it is easy to implement and does not require any data imputation to yield asymptotically valid statistical inference. Furthermore, given the minimal underlying assumptions, our approach is readily applicable to a variety of data types such as continuous data, binary data, ordinal data, count data, and rate data. Link
Marcel Wolbers - Using shrinkage methods to estimate treatment effects in overlapping subgroups in randomized clinical trials with a time-to-event endpoint. Is the overall treatment effect of a RCT typically a more reliable estimate of the treatment effect in the various subgroups examined than the observed effects in individual subgroups? Are shrinkage estimators even more accurate? Our work proposes a new shrinkage estimator for overlapping subgroups and time-to-event outcomes. It also addresses the above questions. Link
Denis Talbot - Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes. Our tutorial on the use of targeted maximum likelihood with machine learning for estimating causal effects with time-to-event outcomes is now published. Link
Ping Gao - Adaptive two-stage seamless sequential design for clinical trials. Typically, drug development involves phase 2 and phase 3 trials. Often, the phase 2 and 3 trials are conducted sequentially. The phase 2 trial is conducted and analyzed first. Then the phase 3 trial is planned and conducted. each trial involves multiple activities such as protocol development, regulatory approval, institution review board (IRB) approval, and budget negotiations with vendors, site negotiation and initiations. Each of the activities can be time consuming. There will be a time gap (for phase 2 trial analysis and phase 3 trial planning) between phase 2 and 3 trials which can be months long. A seamless combination requires only one protocol development, one regulatory approval and one IRB approval which saves time and resources. A seamless combination also eliminates the time gap between the phase 2 and phase 3 trials. Gao and Li (2024) proposed an adaptive sequential design for seamless phase 2/3 combination. The article (open access)can be downloaded at Link
Recommended by LinkedIn
Comparison of Bayesian methods for extrapolation of treatment effects: a large scale simulation study. This paper explores Bayesian methods for borrowing treatment effects—rather than control arm data—in clinical trials where sample sizes are limited. Through an extensive simulation study, the authors assess frequentist operating characteristics (e.g., probability of success, bias, precision, coverage) of various approaches. Findings highlight that the Conditional Power Prior and Robust Mixture Prior offer stronger overall performance, while test-then-pool and p-value-based power prior methods are less effective. The study helps clarify the strengths and limitations of these methods in confirmatory trial settings. Tristan Fauvel, Julien Tanniou, Pascal Godbillot, Billy AMZAL — Link
Sarwar Mozumder - A new paper Time-to-Event Estimands in Oncology: What's Censoring Got To Do, Got To Do With It? has been accepted and will soon be published in Statistics in Biopharmaceutical Research. We intended this to be a short, accessible paper for applied researchers trying to navigate Estimands for survival endpoints. In particular, we clarify the role of censoring and when/how they should be defined as part of Estimation in order for the correct Estimand to be targeted. Link
On the distinction between the per-protocol effect and the effect of the treatment strategy. This paper argues that in randomized trials, the per-protocol effect (the effect of receiving treatment according to the assigned strategy) is not necessarily the same as the effect of the treatment strategy itself. The authors explore a causal structure showing that these two effects and their corresponding identifying observed data functionals are different, though both require information on assignment for accurate identification. They emphasize that the per-protocol effect is not always an observational analog of the treatment strategy effect, and that in some cases, identification of these effects requires data on treatment assignment. Additionally, they suggest that making assumptions such as the exclusion-restriction assumption (where assignment only affects the outcome through treatment) is necessary for identifying these effects in observational studies. Link to the original post by Ryan Batten, PhD(c)
Success and Futility Criteria for Accelerated Approval of Oncology Drugs. Project FrontRunner aims to promote the development of cancer drugs for advanced or metastatic disease by utilizing regulatory strategies like the accelerated approval pathway. The FDA's draft guideline suggests a one-trial approach that combines accelerated and regular approval in a single trial, ensuring efficiency. This article introduces a method to control Type I error in this one-trial approach by implementing success and futility boundaries for p-values.
In this framework:
This approach is designed to maintain Type I error control and is flexible enough to allow clinical teams to adapt success and futility thresholds based on clinical and regulatory needs, while still ensuring robust statistical integrity. Link
Real World Evidence & Real World Data
Zhaohui Su - The Bar Is High: Evaluating Fit-for-Use Oncology Real-World Data for Regulatory Decision Making. Excited to share insights on the evolving landscape of real-world data (#RWD) and real-world evidence (#RWE) in oncology, emphasizing data quality's significance for regulatory decision-making. Key points include:
- The rapid growth and diverse applications of RWD in oncology.
- Challenges in evaluating data quality and the necessity of multidisciplinary expertise.
- The critical role of transparency in study design and data source assessment.
- Leveraging technological advancements and AI to enhance data reliability.
External Control Arms for Single-Arm Studies: Methodological Considerations and Applications
Deborah Layton - The series "External Control Arms for Single-Arm Studies: Methodological Considerations and Applications" has been successfully completed and published in Frontiers in Drug Safety and Regulation. Co-edited by Laura Hester, Asieh Golozar, and the author, the collection includes five impactful publications, collectively achieving over 11,000 views and 2,000 downloads. The editorial highlights the growing importance of external comparator studies and emphasizes the need for continued methodological refinement—addressing terminology, trial emulation, hybrid approaches, and bias mitigation—to strengthen their role in regulatory and payer decision-making. Link to the original post
Use of Open Claims vs Closed Claims in Health Outcomes Research. Interesting read that underscores the advantage of open claims in RWE, especially in rare disease and newly launched treatments. The study comparing open vs. closed data found:
- 10-65x larger sample sizes in open claims data
- Near real-time updates (compared to a 6-8 month lag in closed claims)
- Broader patient coverage—capturing care across different insurers
- Comparable accuracy when de-duplication and validation techniques are applied
Events & Webinars
Stephen Mc Cawille - How Can the SAS Macro Language Enhance LLM Integration in SAS® 9.4 for Clinical Programming?
For anyone interested, I'll be presenting a webinar on June 3rd as part of the SAS Ask the Expert webinar series. The session will cover how to use the SAS macro language to connect to LLMs like ChatGPT or Claude within the SAS 9.4 environment using tools such as PROC HTTP and JSON payloads, all embedded within a macro. We'll also explore how to utilise the SAS macro language to streamline input prompts and fine-tune contextually relevant responses to support clinical programming tasks. Hope to see you there! Link
📆3rd June 2025 - 10 a.m. ET - Online
Ethics and Innovative Clinical Trial Designs
Join us for a full-day workshop in Cambridge, focused on ethical and methodological issues surrounding adaptive clinical trial designs. This event will bring together leading statisticians, ethicists, industry representatives, policymakers (including the WHO), regulators (including the FDA), and patient advocacy groups to discuss the evolving landscape of adaptive clinical trial designs and their ethical implications. Link
📆24th April 2025 - Cambridge university, Newnham College
Robert Grant - Introduction to Bayesian Analysis using Stan. If you want to learn Bayesian statistical modelling, I have two courses coming up this year with the Royal Statistical Society that are open to all, affordable and aimed at beginners.
We will use Stan, which is not only very stable and very fast, but importantly for beginners, has a clear and explicit language for specifying your models. We will start from basics and build up over two days. Once you've learnt the principles, you ca easily translate them to other software.
📆On 29-30 April, in London, you can join me in person! There's only a few places left, so get in there while you can.
📆On 23-24 September, we do the same thing online, with a global reach. We've had attendees as far away as Australia before! Link
2025 CAUSALab Methods Series with Jonathan Bartlett - Webinar from February 18, 2025
As part of the 2025 CAUSALab Methods Series at Karolinska Institutet, Jonathan Bartlett, Professor in Medical Statistics at London School of Hygiene & Tropical Medicine, presented "G-formula for causal inference using synthetic multiple imputation" . Link
Statistical Methods for Combined Accuracy and Precision Approaches for Validation - Presentation from March 2, 2025 by Thomas de Marchin
I shared insights on how combined Accuracy and Precision can elevate analytical method validation and how SmartSTATS\Enoval can transform your report generation process, saving time and ensuring compliance with industry standards. Link to the original post
Peyman Eshghi - PHUSE CSS 2025 The teal framework, an open-source tool for interactive data analysis and visualization, has gained significant traction since its release on CRAN. The Teal Enhancement Working Group is excited to announce it will host sessions at PHUSE CSS to promote teal's adoption and foster collaboration across various sectors, including EMA. The aim is to share experiences and identify areas for improvement, making teal more adaptable and efficient for data visualization in the pharma industry. The sessions are open to those interested, but with limited capacity (around 100), early registration is recommended. The event will take place in
📆Utrecht, Netherlands, on May 20-21. Link
Bayesian Biostatistics 2025 - the program is taking shape
Nicky Best, Beyond the Classical Type I Error: Bayesian Metrics for Bayesian Designs Using Informative Priors
Andy Grieve, Predictive and Pre-Posterior Distributions in the Planning of Clinical Trials
Virgilio Gómez Rubio, Approximate Bayesian inference for the analysis of population health data
Harrison Quick, The Intersection of Informative Priors and Differential Privacy in Bayesian Spatial Biostatistics
Nicky Welton, Multi-level Network Meta-Regression (ML-NMR) for population adjustment in Health Technology Assessment
📆22 October 2025 from h. 09:00 AM to h. 12:30 PM CET Link
Curious about causality and sharing what I learn
2wThanks for including me Krzysztof Orzechowski! These newsletters are always full of great resources
Strategic Advisor to Verisian, CDISC SME, Founder of SASSavvy.com and R-Guru.com
2wThanks Krzyztof for your informative newsletter! I appreciate you sharing these many great innovations.
Manager, Data Engineering at Daiichi Sankyo Europe GmbH
2wThanks for the plug!
Programme Leader at MRC Biostatistics Unit, Visiting Faculty Researcher @ Google
2wThanks for including an event of ours!
Father 👨🏻🍼 Founder of Better Biostatistics 🔬Life Sciences Educator 👨🏫
2wPhenomenal work as always Krzysztof Orzechowski! Thank you for all the work you do!