Visualize Collider Bias with me
It’s 2020. You are a doctor. COVID19 has just picked up. You are going to face a flood of patients today.
In the first hour, you see a 50 COVID19 patients. You take their details, and you get to know that all of them are smokers. You start thinking that smoking impacts COVID19 infection.
In the second hour, you see another 50 COVID19 patients. You take their details, and you get to know that these are not smokers. Now you would think that there is less relation between COVID19 and smoking. You see equal percentages of smokers and non-smokers in COVID19 patients.
In the third hour, you see another 50 patients. These are not COVID19 patients. These are smokers with other respiratory diseases. Now see how one percentage changed. Percentage of smokers who got COVID reduced to 50%. Percentage of non-smokers who got COVID still remains at 100%! Non-smokers are more susceptible to COVID19 infection!
Recommended by LinkedIn
I’m not kidding you. This is what many researchers mistakenly found [2]. This mistake stems from the fact that you, as a doctor, fail to see one set of people completely. The non-COVID, non-Smoking category. They need not come to your hospital at all. So, your calculation of percentage of non-smokers who got COVID is an over-estimate.
Now replace yourself as a doctor with a statistical model. You are model with a bias — Collider bias.
Why it is called a Collider Bias? And more technical details of detecting and avoiding it are widely written about. I wrote this blog only to get an intuition of it. Hope it helped!
References: