Network Update for Programmers, Biostatisticians and clinical data enthusiasts #68

Network Update for Programmers, Biostatisticians and clinical data enthusiasts #68


Artificial Intelligence

Applications of Large Models in Medicine. This paper explores the impact of large-scale models, particularly Medical Large Models (MedLMs), on healthcare advancements. These models include Large Language Models (LLMs), Vision Models, 3D Large Models, and Multimodal Models, all of which are transforming disease prediction, diagnostics, personalized treatment planning, and drug discovery. The integration of graph neural networks in medical knowledge graphs enhances understanding of complex biomedical relationships, while Large Graph Models (LGMs) are improving drug discovery. Vision-Language Models (VLMs) and 3D models are revolutionizing medical image analysis, anatomical modeling, and prosthetic design. Despite challenges, these technologies are significantly improving diagnostic accuracy and enabling personalized healthcare. The paper provides an overview of the current state and future potential of large models in medicine, highlighting their importance in advancing global health. Link


Gary Monk - Novo Nordisk is using Claude AI to drastically reduce the time it takes to draft clinical study reports, from 15 weeks to just 10 minutes, with only three writers overseeing the process instead of 50+. Although no writers have been laid off, fewer are being hired, allowing the company to reallocate savings to other departments.

Several other pharma companies are also utilizing AI to drive operational efficiencies:

  • Sanofi developed the plai app with AILY LABS for decision-making across departments.
  • Sanofi partnered with OpenAI and Formation Bio to create Muse, an AI tool for patient recruitment in clinical trials.
  • Bristol Myers Squibb uses AI for clinical data narration and document review, with strong ethical AI guidelines and privacy measures.
  • Gilead Sciences expanded its AI partnership with Cognizant to enhance operations and accelerate drug development.
  • Moderna partnered with OpenAI to provide ChatGPT Enterprise for its employees, aiming to create an AI-centric culture.
  • Pfizer developed a generative AI platform, ‘Charlie,’ for content creation, editing, and collaboration support.

While these advancements are impressive, there’s a reminder that efficiency gains alone are meaningless without true effectiveness. AI use cases in pharma often prioritize efficiency over effectiveness, a balance still worth considering. Link to the original post


Paul Agapow - In a recent post shared a very intersting, 32 pages long presentation on AI and the industry. Some of the takeaways:

  • AI for Clinical Data: AI models can instantly translate clinical data and search large volumes of documents, reducing information attrition and avoiding delays in data conversion.
  • AIML in Clinical Trials: AI and machine learning can enhance clinical trials by improving forecasting, modeling patient burden and trial complexity, and identifying patient subgroups or treatment variances.
  • Clinical Trial Simulation: As trials become more complex, simulation is a flexible, vital strategy that can capture dependencies, though there are operational challenges in its application.
  • Causal AI: Causal AI will be crucial in understanding mechanisms in biomedicine, especially for complex patient populations and rare diseases where controlled experiments are not possible.

AI projects must align with the company’s focus (e.g., drug development, patient screening) and be mindful of how different teams operate. Finally, they suggested that while large-impact AI opportunities exist (e.g., clinical trials), starting with smaller, more contained projects (e.g., regulatory documentation) may be more strategic for initial success. Link to the original post


Rebecca D. Jones-Taha, PhD MBA - Genome modeling and design across all domains of life with Evo 2. Check out yesterday’s announcement where the Arc Institute has developed the largest AI model for biology – which can predict, with 90% accuracy, which genetic changes in human DNA are likely to cause diseases like breast cancer (among other things). Link to the original post

Statistical Programming


Bartosz Jabłoński - During upcoming #PharmaSUG 2025 conference will be presenting on the topic of the use of SAS Packages in (Any)/the Pharma Industry – Opportunities, Possibilities, and Benefits

Sharing complex SAS® code—especially across different systems—can be challenging due to dependencies like macros, formats, and datasets. This article introduces SAS Packages as a solution to simplify code organization, deployment, and sharing. It explains how to create and use these packages to bundle all necessary components, making code distribution more reliable and efficient. The article also highlights the benefits and opportunities SAS Packages offer, particularly within the pharmaceutical industry. Link


Lisa Lyons - Going Under the Hood: How Does the Macro Processor Really Work? Did you ever wonder what really goes on behind the scenes of the macro processor, or how it works with other parts of the SAS® system? With the knowledge of the macro processing sequence of events, programmers can better understand how macro processing works. Since most macros are generic ones, this particular knowledge can save valuable development, debugging, and validation time. This paper will help you to understand the timing of macro processing and its relationship with SAS language processing. Link


Stefan Thoma - Please welcome {autoslider.core} to the pharmaverse!

Creating clinical study slides just got easier!

Our latest blog post introduces {autoslider.core}, an R package designed to automate PowerPoint slide generation for clinical reporting.

🔹 Save Time – No more manual slide creation.

🔹 Reduce Errors – Stop copy-pasting outputs and values.

🔹 Enhance Efficiency – Generate & update slide decks effortlessly.

Originally developed at Roche and now open-sourced in the pharmaverse, {autoslider.core} helps streamline the process of producing study outputs for presentations. Link


José Francisco Román Brito - Posts about good programming practices and software engineering principles. To truly master SQL, developers must focus on:

  1. Code Quality: Write clean, readable, and maintainable SQL. Use descriptive names, document queries, and avoid complex, nested subqueries.
  2. Optimization and Performance: Understand query performance impacts, use tools like EXPLAIN, and avoid inefficient queries that harm the database.
  3. Software Engineering Principles: Apply principles like DRY (Don't Repeat Yourself) and KISS (Keep It Simple, Stupid) to design scalable, sustainable solutions.
  4. Security and Ethics: Protect against SQL injection, comply with data privacy regulations, and use data ethically.
  5. Teamwork: Write SQL that is easy for teammates to understand and collaborate on, ensuring business requirements are met.

Mastering SQL involves not just knowing advanced functions, but ensuring the code is efficient, secure, maintainable, and collaborative. Link to the original post


Péter Ferenczy - AI can be a great tool for quick coding help, but it sometimes struggles to adapt when its initial solutions fail. While working in RStudio with Vim keybindings, Peter faced a simple task—incrementing numbers using Vim—but AI repeatedly offered ineffective solutions, even after feedback. The AI failed to reassess its strategy, getting stuck in a loop. A breakthrough came after switching to a different AI and providing the full context of previous failures, leading to a successful solution. Key takeaway: If AI gets stuck, reset the context—restart the chat, reframe the question, or try a different AI to get fresh results. Original post - Link


Athenkosi Nkonyeni - Representation of Numeric Dates in Dataset-JSON

Handling numeric dates values across different systems has long been a challenge due to varying date epochs and floating-point precision issues. These discrepancies can lead to incorrect data interpretation, affecting interoperability in regulatory submissions and other critical data exchanges.

Dataset-JSON v1.1 addresses these challenges by adopting the ISO 8601 standard for numeric dates and introducing the targetDataType metadata attribute. These enhancements eliminate the need for external documentation, prevent precision loss, and ensure seamless data exchange between different programming environments and systems. Link to the original post


R Consortium - R Submissions Working Group: Pilot 5 Launch and more!

The R Consortium Submission Working Group is excited to announce a new Pilot 5 that aims to deliver an R-based Submission to the FDA using Dataset-JSON. This post also includes plans for additional launches for 2025/2026, some news on Pilot 4 (containers and webassembly) and a few other goodies! Link to the original post


Pharmaverse - Working with Clinical Trial Data? There’s a Pharmaverse Package for That

Looking for R packages to manage clinical trial data? Pharmaverse has tools for every stage from data collection to submission! Link to the original post

Biostatistics


Tim Morris - Missing baseline data when analysing change-from-baseline. This post discusses the handling of missing baseline data in randomized controlled trials (RCTs) when analyzing change-from-baseline outcomes. Tim examines the implications of adjusting for baseline data and the use of mean imputation methods for missing baseline values.


Sofia S. Villar - Exact statistical tests using integer programming: Leveraging an overlooked approach for maximizing power for differences between binomial proportions. Traditional methods for testing differences in binomial proportions—like the Wald and Fisher’s exact tests—often struggle with type I error control, especially in small samples. Regulators prefer conservative methods, but this can compromise statistical power. This work extends a 1969 approach to develop a new family of exact tests that maximize power while strictly controlling type I error. The method uses integer programming to define optimal rejection regions, offering both theoretical guarantees and empirical improvements over standard tests. It is especially robust when optimized for average power across alternatives and can be customized using weighted priors. The study also demonstrates the value of applying combinatorial optimization in statistical methodology. Link


Thomas Debray, PhD - A new paper led by Orestis Efthimiou Measuring the Performance of Prediction Models to Personalize Treatment Choice, introduces a framework for evaluating models that estimate individualized treatment effects. The framework focuses on three key dimensions:

  1. Discrimination for Benefit – How well the model identifies who benefits from treatment.
  2. Calibration for Benefit – How accurately the model estimates treatment effects for individuals.
  3. Decision Accuracy – Whether the model correctly selects patients whose treatment benefit exceeds a predefined threshold.

The framework is demonstrated using simulations and a real-world depression trial, with all methods available in the R package predieval (on CRAN). These concepts will also feature in an upcoming book by the authors. Link


Federico R. on a recently published paper - Success and Futility Criteria for Accelerated Approval of Oncology Drugs. When a single pivotal trial is used to test an early endpoint for Accelerated Approval (AA) and a late endpoint for Regular Approval (RA), the overall type-1 error rate is typically controlled by splitting the overall alpha level between the two tests, using some multiple testing procedure.The drawback of this approach is that the late endpoint for RA is tested at a more stringent level, unless the early endpoint for AA shows statistically significant results.In this recent paper, Dong Xi and Jiangtao Gou propose a smart way to test the late endpoint for RA at the full alpha level, by using on a futility boundary for the early endpoint for AA. Despite this additional futility rule somehow reduces the power of the test of the late endpoint for RA, in some sense it just formally accounts for the likely decision to stop the trial in the case the early endpoint shows very deceiving results. Conversely, classical approaches based on alpha level splitting ignore the fact that the tests occur at different times, and the second test is unlikely to be conducted if the first shows very poor results. Link


Cesar T. - A Tipping Point Method to Evaluate Sensitivity to Potential Violations in Missing Data Assumptions. Some current/former colleagues and I had our manuscript published, regarding our tipping point analysis method for the evaluation of missing data assumptions. The approach is novel in that it is easy to implement and does not require any data imputation to yield asymptotically valid statistical inference. Furthermore, given the minimal underlying assumptions, our approach is readily applicable to a variety of data types such as continuous data, binary data, ordinal data, count data, and rate data. Link


Marcel Wolbers - Using shrinkage methods to estimate treatment effects in overlapping subgroups in randomized clinical trials with a time-to-event endpoint. Is the overall treatment effect of a RCT typically a more reliable estimate of the treatment effect in the various subgroups examined than the observed effects in individual subgroups? Are shrinkage estimators even more accurate? Our work proposes a new shrinkage estimator for overlapping subgroups and time-to-event outcomes. It also addresses the above questions. Link

Denis Talbot - Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes. Our tutorial on the use of targeted maximum likelihood with machine learning for estimating causal effects with time-to-event outcomes is now published. Link


Ping Gao - Adaptive two-stage seamless sequential design for clinical trials. Typically, drug development involves phase 2 and phase 3 trials. Often, the phase 2 and 3 trials are conducted sequentially. The phase 2 trial is conducted and analyzed first. Then the phase 3 trial is planned and conducted. each trial involves multiple activities such as protocol development, regulatory approval, institution review board (IRB) approval, and budget negotiations with vendors, site negotiation and initiations. Each of the activities can be time consuming. There will be a time gap (for phase 2 trial analysis and phase 3 trial planning) between phase 2 and 3 trials which can be months long. A seamless combination requires only one protocol development, one regulatory approval and one IRB approval which saves time and resources. A seamless combination also eliminates the time gap between the phase 2 and phase 3 trials. Gao and Li (2024) proposed an adaptive sequential design for seamless phase 2/3 combination. The article (open access)can be downloaded at Link


Comparison of Bayesian methods for extrapolation of treatment effects: a large scale simulation study. This paper explores Bayesian methods for borrowing treatment effects—rather than control arm data—in clinical trials where sample sizes are limited. Through an extensive simulation study, the authors assess frequentist operating characteristics (e.g., probability of success, bias, precision, coverage) of various approaches. Findings highlight that the Conditional Power Prior and Robust Mixture Prior offer stronger overall performance, while test-then-pool and p-value-based power prior methods are less effective. The study helps clarify the strengths and limitations of these methods in confirmatory trial settings. Tristan Fauvel, Julien Tanniou, Pascal Godbillot, Billy AMZAL Link


Sarwar Mozumder - A new paper Time-to-Event Estimands in Oncology: What's Censoring Got To Do, Got To Do With It? has been accepted and will soon be published in Statistics in Biopharmaceutical Research. We intended this to be a short, accessible paper for applied researchers trying to navigate Estimands for survival endpoints. In particular, we clarify the role of censoring and when/how they should be defined as part of Estimation in order for the correct Estimand to be targeted. Link


On the distinction between the per-protocol effect and the effect of the treatment strategy. This paper argues that in randomized trials, the per-protocol effect (the effect of receiving treatment according to the assigned strategy) is not necessarily the same as the effect of the treatment strategy itself. The authors explore a causal structure showing that these two effects and their corresponding identifying observed data functionals are different, though both require information on assignment for accurate identification. They emphasize that the per-protocol effect is not always an observational analog of the treatment strategy effect, and that in some cases, identification of these effects requires data on treatment assignment. Additionally, they suggest that making assumptions such as the exclusion-restriction assumption (where assignment only affects the outcome through treatment) is necessary for identifying these effects in observational studies. Link to the original post by Ryan Batten, PhD(c)


Success and Futility Criteria for Accelerated Approval of Oncology Drugs. Project FrontRunner aims to promote the development of cancer drugs for advanced or metastatic disease by utilizing regulatory strategies like the accelerated approval pathway. The FDA's draft guideline suggests a one-trial approach that combines accelerated and regular approval in a single trial, ensuring efficiency. This article introduces a method to control Type I error in this one-trial approach by implementing success and futility boundaries for p-values.

In this framework:

  • Success: Allows for accelerated approval.
  • RA (Regular Approval): Considered if the trial passes the regular approval endpoint without penalty.
  • Futility: If the trial is deemed futile, it is stopped early.

This approach is designed to maintain Type I error control and is flexible enough to allow clinical teams to adapt success and futility thresholds based on clinical and regulatory needs, while still ensuring robust statistical integrity. Link


Real World Evidence & Real World Data


Zhaohui Su - The Bar Is High: Evaluating Fit-for-Use Oncology Real-World Data for Regulatory Decision Making. Excited to share insights on the evolving landscape of real-world data (#RWD) and real-world evidence (#RWE) in oncology, emphasizing data quality's significance for regulatory decision-making. Key points include:

- The rapid growth and diverse applications of RWD in oncology.

- Challenges in evaluating data quality and the necessity of multidisciplinary expertise.

- The critical role of transparency in study design and data source assessment.

- Leveraging technological advancements and AI to enhance data reliability.


External Control Arms for Single-Arm Studies: Methodological Considerations and Applications

Deborah Layton - The series "External Control Arms for Single-Arm Studies: Methodological Considerations and Applications" has been successfully completed and published in Frontiers in Drug Safety and Regulation. Co-edited by Laura Hester, Asieh Golozar, and the author, the collection includes five impactful publications, collectively achieving over 11,000 views and 2,000 downloads. The editorial highlights the growing importance of external comparator studies and emphasizes the need for continued methodological refinement—addressing terminology, trial emulation, hybrid approaches, and bias mitigation—to strengthen their role in regulatory and payer decision-making. Link to the original post


Use of Open Claims vs Closed Claims in Health Outcomes Research. Interesting read that underscores the advantage of open claims in RWE, especially in rare disease and newly launched treatments. The study comparing open vs. closed data found:

- 10-65x larger sample sizes in open claims data

- Near real-time updates (compared to a 6-8 month lag in closed claims)

- Broader patient coverage—capturing care across different insurers

- Comparable accuracy when de-duplication and validation techniques are applied

Link

Events & Webinars


Stephen Mc Cawille - How Can the SAS Macro Language Enhance LLM Integration in SAS® 9.4 for Clinical Programming?

For anyone interested, I'll be presenting a webinar on June 3rd as part of the SAS Ask the Expert webinar series. The session will cover how to use the SAS macro language to connect to LLMs like ChatGPT or Claude within the SAS 9.4 environment using tools such as PROC HTTP and JSON payloads, all embedded within a macro. We'll also explore how to utilise the SAS macro language to streamline input prompts and fine-tune contextually relevant responses to support clinical programming tasks. Hope to see you there! Link

📆3rd June 2025 - 10 a.m. ET - Online


Ethics and Innovative Clinical Trial Designs

Join us for a full-day workshop in Cambridge, focused on ethical and methodological issues surrounding adaptive clinical trial designs. This event will bring together leading statisticians, ethicists, industry representatives, policymakers (including the WHO), regulators (including the FDA), and patient advocacy groups to discuss the evolving landscape of adaptive clinical trial designs and their ethical implications. Link

📆24th April 2025 - Cambridge university, Newnham College


Robert Grant - Introduction to Bayesian Analysis using Stan. If you want to learn Bayesian statistical modelling, I have two courses coming up this year with the Royal Statistical Society that are open to all, affordable and aimed at beginners.

We will use Stan, which is not only very stable and very fast, but importantly for beginners, has a clear and explicit language for specifying your models. We will start from basics and build up over two days. Once you've learnt the principles, you ca easily translate them to other software.

📆On 29-30 April, in London, you can join me in person! There's only a few places left, so get in there while you can.

📆On 23-24 September, we do the same thing online, with a global reach. We've had attendees as far away as Australia before! Link


2025 CAUSALab Methods Series with Jonathan Bartlett - Webinar from February 18, 2025

As part of the 2025 CAUSALab Methods Series at Karolinska Institutet, Jonathan Bartlett, Professor in Medical Statistics at London School of Hygiene & Tropical Medicine, presented "G-formula for causal inference using synthetic multiple imputation" . Link


Statistical Methods for Combined Accuracy and Precision Approaches for Validation - Presentation from March 2, 2025 by Thomas de Marchin

I shared insights on how combined Accuracy and Precision can elevate analytical method validation and how SmartSTATS\Enoval can transform your report generation process, saving time and ensuring compliance with industry standards. Link to the original post


Peyman Eshghi - PHUSE CSS 2025 The teal framework, an open-source tool for interactive data analysis and visualization, has gained significant traction since its release on CRAN. The Teal Enhancement Working Group is excited to announce it will host sessions at PHUSE CSS to promote teal's adoption and foster collaboration across various sectors, including EMA. The aim is to share experiences and identify areas for improvement, making teal more adaptable and efficient for data visualization in the pharma industry. The sessions are open to those interested, but with limited capacity (around 100), early registration is recommended. The event will take place in

📆Utrecht, Netherlands, on May 20-21. Link


Bayesian Biostatistics 2025 - the program is taking shape

Nicky Best, Beyond the Classical Type I Error: Bayesian Metrics for Bayesian Designs Using Informative Priors

Andy Grieve, Predictive and Pre-Posterior Distributions in the Planning of Clinical Trials

Virgilio Gómez Rubio, Approximate Bayesian inference for the analysis of population health data

Harrison Quick, The Intersection of Informative Priors and Differential Privacy in Bayesian Spatial Biostatistics

Nicky Welton, Multi-level Network Meta-Regression (ML-NMR) for population adjustment in Health Technology Assessment

📆22 October 2025 from h. 09:00 AM to h. 12:30 PM CET Link




Ryan Batten, PhD(c)

Curious about causality and sharing what I learn

2w

Thanks for including me Krzysztof Orzechowski! These newsletters are always full of great resources

Like
Reply
Sunil Gupta

Strategic Advisor to Verisian, CDISC SME, Founder of SASSavvy.com and R-Guru.com

2w

Thanks Krzyztof for your informative newsletter! I appreciate you sharing these many great innovations.

Stephen Mc Cawille

Manager, Data Engineering at Daiichi Sankyo Europe GmbH

2w

Thanks for the plug!

Sofia S. Villar

Programme Leader at MRC Biostatistics Unit, Visiting Faculty Researcher @ Google

2w

Thanks for including an event of ours! 

Robert Rachford

Father 👨🏻🍼 Founder of Better Biostatistics 🔬Life Sciences Educator 👨🏫

2w

Phenomenal work as always Krzysztof Orzechowski! Thank you for all the work you do!

Like
Reply

To view or add a comment, sign in

More articles by Krzysztof Orzechowski

  • Network Update for Programmers, Biostatisticians and clincal data enthusiasts #66

    Network Update for Programmers, Biostatisticians and clincal data enthusiasts #66

    AI Gowri Sivakumar A - AI’s Influence on SAS Programming. AI tools are transforming the role of SAS programmers, making…

    15 Comments
  • Challenges for Statistical Programmers in the Coming Years

    Challenges for Statistical Programmers in the Coming Years

    The field of statistical programming in the clinical trial industry is undergoing rapid transformation. As automation…

    24 Comments
  • Network Update #66

    Network Update #66

    AI Andrii Buvailo, Ph.D.

    5 Comments
  • Network Update #65

    Network Update #65

    AI Large Language Model Influence on Diagnostic ReasoningA Randomized Clinical Trial (thanks 🕵🏼 Jan Zachnik for…

    7 Comments
  • Network Update #64

    Network Update #64

    Industry Today, I would like to start with a very interesting post from FDA. Oncology Accelerated Approval Confirmatory…

    6 Comments
  • Network Update #63

    Network Update #63

    Programming R & SAS PHUSE Single Day Event was hosted on August 22nd in Bloomfontein. It was a big success with many…

    14 Comments
  • Network Update #62

    Network Update #62

    Statistical Programming PharmaSUG 2024 - #BestPaperAward recipient for #AdvancedProgramming is by David Bosak, Archytas…

    17 Comments
  • Network Update Statistical Programming and Biostatistics #61

    Network Update Statistical Programming and Biostatistics #61

    Programming Bartosz Jabłoński - The SAS Packages Framework, version 20240529, is ready. Release changes: - aesthetic…

    6 Comments
  • Newsletter for Statistical Programmers and Biostatisticians #60

    Newsletter for Statistical Programmers and Biostatisticians #60

    I'd like to extend a huge thank you to everyone who participated in my poll regarding R, SAS, SDTMs, ADaMs, and TLFs…

    10 Comments
  • Newsletter for Statistical Programmers and Biostatisticians #59

    Newsletter for Statistical Programmers and Biostatisticians #59

    Programming Jagadish Katam writes about SDTM IM guide - How many times I've gone through the SDTM IM guide, I've come…

    17 Comments

Insights from the community

Others also viewed

Explore topics