A New Covariate Design for Prediction Modeling
Our recently published paper in Scientific Reports (Nature Portfolio) introduces a method for designing covariates to enhance prediction modeling. Our related work was also presented at the AMIA Summit in March 2024 in Boston, titled Optimal Duration for Steroid Therapy in Inflammatory Bowel Disease: A Causal Analysis using the UK Biobank.
The studies focused on inflammatory bowel disease (IBD) and aimed to identify optimal durations for medication use to balance effectiveness and minimize risks. To achieve this, we developed a detailed approach to extract and design covariates from prescription data, allowing for a more nuanced analysis of medication effects.
Extracting Prescription Data: Methodology in Detail
We extracted data related to prescriptions for 14 IBD-related medications. The dataset contained 30,768 prescriptions for 4,534 subjects associated with these medications. Each prescription entry had a start date but no recorded end date. To address this, we incorporated clinical expertise to standardize daily doses based on IBD indications and best practices. For example:
To determine the end date for each prescription, we combined the calculated monthly usage with the quantity field available in the data. For example, a prescription starting on December 8, 2008, with a quantity of 40 tablets prescribed once daily would have an estimated end date of January 17, 2009, covering a duration of 40 days.
Recommended by LinkedIn
Designing Covariates from Prescription Data
We designed a total of 112 unique covariates capturing prescription duration. These included:
In addition, prescriptions were limited to tablet or injection forms, excluding other types such as drops, inhalers, or powders. For steroids, we specifically defined “steroid dependence” as any instance of prednisolone use for 12 weeks or longer within a subject’s entire data history.
Application: Prednisolone and IBD Management
Using these covariates, we analyzed the association between prednisolone duration and surgical outcomes in IBD patients. The results indicated that the commonly used 12-week threshold for long-term steroid use might underestimate the risk of adverse outcomes, such as subsequent gastrointestinal surgeries. A shorter duration of 7–8 weeks was identified as potentially safer in our data.
Broader Implications
This method of extracting and designing covariates is generalizable beyond healthcare. By applying similar principles, prediction modeling can be enhanced in various fields where temporal data is available.