Understanding association between two categorical variables: Contingency Table

Understanding association between two categorical variables: Contingency Table

Let's take up an example of e-commerce website where we want to understand customer behavior through its spending habit and frequency of purchase.

In other words, we are interested to understand - "To what extent does the frequency of purchases influence the total amount spent by customers, and what patterns emerge from different frequency segments in terms of their spending behavior?"

We can make use of Contingency table to give us solution to our above question.

  • Let's first understand What is Contingency Table? A contingency table is a simple way to organize and display data to show the relationship between two or more different categories. We can think of it like a grid or chart where we count how many times combinations of categories happen together.
  • Suppose we have our dummy data in below format. We are making use of F-score and M-score that is used to represent customer frequency and amount to bins within range [1,5] with 1 being the lowest value and 5 being the highest value

Article content
Figure 1 : Sample Customer Data After Preprocessing.

  • We can create our contingency table using following code in python.

Article content
Figure 2 : Code to create contingency table

  • Let's understand this dummy contingency table created for understanding.

Article content
Figure 3 : Contingency Table

  • So, in above Contingency table, each cell is combination of F-score and M-score. Column represents M-score(monetary score) and row represent F-score(frequency score). Totals of each row and each column are in the margin and grand total is in bottom-right margin.
  • So from this, we can say that we have in total 140 customers. Looking at the cell in 3rd row in 3rd col, we see that there are 7 customers who have frequency score of 3 and monetary score of 3.
  • We can further analyze our contingency table to find the Marginal distribution and conditional distributions. These two distribution are type of frequency distribution.
  • Marginal Distribution : These distribution represent the frequency distribution of one categorical variable without regard for other variables. And these distributions are in the margins of our contingency table.
  • Example, the marginal distribution of F-scores without considering M-scores is the following:

F-score 1 : 20

F-score 2 : 30

F-score 3 : 30

F-score 4 : 30

F-score 5 : 30

  • Now to answer our question that we started with, we will be making use of percentages in contingency table. We will calculate row percentage or column percentage depending on the question we ask. Here, we want to understand whether frequency of purchase influence the amount of purchase. So, here we will take column percentage and this displays following table.

Article content
Figure 4 : Column percentage of contingency table

  • From the Figure 4, we can see that for M-score of 1 , has F-score 1 of 16%, F-score 2 of 28%, F-score 3 of 20%, F-score 4 of 24% and F-score 5 of 12%.
  • We can further write python code to understand correlation between two variables as follows:-

Article content
Figure 5 : Correlation coefficient

  • Range of correlation coefficient varies from -1 to 1 passing through 0.
  • Correlation is said to be positive when increase in value of one variable increases the value of other variable. Coefficient value closer to 1 means the variables have strong positive correlation.
  • Correlation is said to be negative when increase in value of one variable decreases the value of other variable. Coefficient value closer to -1 means the variables have strong negative correlation.
  • Correlation coefficient value like [0,0.3] or [-0.3,0] means weak or no correlation.

Therefore, using these concepts we can understand if two variables are associated with each other or independent of each other.



Mohit Verma

Technical Lead | Magento 2 | Wordpress | Core-Php | JavaScript | Full Stack Engineer | Ex Webkul | Ex Impulse Software

11mo

Great 👍

To view or add a comment, sign in

More articles by Lily Sinha

Insights from the community

Others also viewed

Explore topics