Visualize Collider Bias with me

Visualize Collider Bias with me

It’s 2020. You are a doctor. COVID19 has just picked up. You are going to face a flood of patients today.

In the first hour, you see a 50 COVID19 patients. You take their details, and you get to know that all of them are smokers. You start thinking that smoking impacts COVID19 infection.

Article content

In the second hour, you see another 50 COVID19 patients. You take their details, and you get to know that these are not smokers. Now you would think that there is less relation between COVID19 and smoking. You see equal percentages of smokers and non-smokers in COVID19 patients.

Article content

In the third hour, you see another 50 patients. These are not COVID19 patients. These are smokers with other respiratory diseases. Now see how one percentage changed. Percentage of smokers who got COVID reduced to 50%. Percentage of non-smokers who got COVID still remains at 100%! Non-smokers are more susceptible to COVID19 infection!

Article content

I’m not kidding you. This is what many researchers mistakenly found [2]. This mistake stems from the fact that you, as a doctor, fail to see one set of people completely. The non-COVID, non-Smoking category. They need not come to your hospital at all. So, your calculation of percentage of non-smokers who got COVID is an over-estimate.

Article content

Now replace yourself as a doctor with a statistical model. You are model with a bias — Collider bias.

Why it is called a Collider Bias? And more technical details of detecting and avoiding it are widely written about. I wrote this blog only to get an intuition of it. Hope it helped!

References:

  1. I liked this video for it’s explanation of Collider Bias: (Berkson’s paradox)| How ‘censored’ data leads to flawed conclusions — YouTube
  2. Paradoxical findings on smoking in reduced risk of severe COVID-19 | International Journal of Epidemiology | Oxford Academic

To view or add a comment, sign in

More articles by Sai Krishna Dammalapati

  • Datafication of Indian court judgments | Part-2

    I worked on the datafication of Indian court judgments two years ago. I detailed that work here: Exploring the…

  • LogProbs

    LogProbs is one of the basic skills for a prompt engineer to have. Some background before implementing it: An LLM model…

    1 Comment
  • When to brush your teeth? A good ANOVA study!

    I found this paper which did a simple ANOVA study to find out when should one brush their teeth! TL;DR Brush twice a…

  • Statistical issues in this paper studying relation between air quality and LULC

    A paper got published in Environmental Monitoring and Assessment. It studied relation between land-use classes (Urban…

  • Bayesian probabilistic forecasts using categorical information | Part 1

    In this blog, I will make Bayesian forecasts of Ozone concentrations. My previous blog on Bayesian analysis: Bayesian…

  • 100% Mediation in Action

    I wrote about Mediators in the previous article. This is a follow-up to it.

  • Mediators

    I one of my previous blogs, we saw Omitted Variable Bias. In this blog, we’ll do mediation analysis using the same…

  • A Statistician counts well

    I’ve come across an article Counting as Statistics in Saket Choudhary's blog. The blog has a story on how statisticians…

  • Omitted Variable Bias (OVB)

    You performed a regression between house prices and area and obtained a coefficient (β) for area. You’d interpret it…

  • Clarifications into Regression Discontinuity Design (RDD)

    I came across one RDD study last week where observational data was used to find the causal link between air pollution…

Insights from the community

Others also viewed

Explore topics