A New Covariate Design for Prediction Modeling

A New Covariate Design for Prediction Modeling

Our recently published paper in Scientific Reports (Nature Portfolio) introduces a method for designing covariates to enhance prediction modeling. Our related work was also presented at the AMIA Summit in March 2024 in Boston, titled Optimal Duration for Steroid Therapy in Inflammatory Bowel Disease: A Causal Analysis using the UK Biobank.

The studies focused on inflammatory bowel disease (IBD) and aimed to identify optimal durations for medication use to balance effectiveness and minimize risks. To achieve this, we developed a detailed approach to extract and design covariates from prescription data, allowing for a more nuanced analysis of medication effects.

Extracting Prescription Data: Methodology in Detail

We extracted data related to prescriptions for 14 IBD-related medications. The dataset contained 30,768 prescriptions for 4,534 subjects associated with these medications. Each prescription entry had a start date but no recorded end date. To address this, we incorporated clinical expertise to standardize daily doses based on IBD indications and best practices. For example:

  • A once-daily prescription was normalized to 30 tablets per month.
  • Medications prescribed every 6–12 hours or 3–4 times per day were normalized to 120 tablets per month.

To determine the end date for each prescription, we combined the calculated monthly usage with the quantity field available in the data. For example, a prescription starting on December 8, 2008, with a quantity of 40 tablets prescribed once daily would have an estimated end date of January 17, 2009, covering a duration of 40 days.

Designing Covariates from Prescription Data

We designed a total of 112 unique covariates capturing prescription duration. These included:

  • Duration Variables: Ranging from 1 day to 12 weeks or more, extracted for each medication.
  • Binary Variables: Indicating a history of medication use for each subject.
  • Classification of Usage: Prescriptions were classified into long-term use (12 weeks or longer) versus short-term use.

In addition, prescriptions were limited to tablet or injection forms, excluding other types such as drops, inhalers, or powders. For steroids, we specifically defined “steroid dependence” as any instance of prednisolone use for 12 weeks or longer within a subject’s entire data history.

Application: Prednisolone and IBD Management

Using these covariates, we analyzed the association between prednisolone duration and surgical outcomes in IBD patients. The results indicated that the commonly used 12-week threshold for long-term steroid use might underestimate the risk of adverse outcomes, such as subsequent gastrointestinal surgeries. A shorter duration of 7–8 weeks was identified as potentially safer in our data.

Broader Implications

This method of extracting and designing covariates is generalizable beyond healthcare. By applying similar principles, prediction modeling can be enhanced in various fields where temporal data is available.

To view or add a comment, sign in

More articles by Uri Kartoun, PhD, FAMIA

Insights from the community

Others also viewed

Explore topics