Open In App

Create a correlation Matrix using Python

Last Updated : 01 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A Correlation matrix is a table that shows how different variables are related to each other. Each cell in the table displays a number i.e. correlation coefficient which tells us how strongly two variables are together. It helps in quickly spotting patterns, understand relationships and making better decisions based on data.

A correlation matrix can be created using two libraries:

1. Using NumPy Library

NumPy provides a simple way to create a correlation matrix. We can use the np.corrcoef() function to find the correlation between two or more variables.

Example: A daily sales and temperature record is kept by an ice cream store. To determine the relationship between sales and temperature, we can utilize the NumPy library, where x is sales in dollars and y is the daily temperature.

Python
import numpy as np
x = [215, 325, 185, 332, 406, 522, 412,
     614, 544, 421, 445, 408],
y = [14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 
     19.4, 25.1, 23.4, 18.1, 22.6, 17.2]
matrix = np.corrcoef(x, y)
print(matrix)

Output:

[[1.                     0.95750662]   
[0.95750662 1. ]]

2. Using Pandas library

Pandas is used to create a correlation matrix using its built-in corr() method. It helps in analyzing and interpreting relationships between different variables in a dataset.

Example: Let’s create a simple DataFrame with three variables and calculate correlation matrix.

Python
import pandas as pd
data = {
    'x': [45, 37, 42, 35, 39],
    'y': [38, 31, 26, 28, 33],
    'z': [10, 15, 17, 21, 12]
}
dataframe = pd.DataFrame(data, columns=['x', 'y', 'z'])
print("Dataframe is : ")
print(dataframe)
matrix = dataframe.corr()
print("Correlation matrix is : ")
print(matrix)

Output:

corelation1

Using Pandas

Example with Real Dataset (Iris Dataset)

In this example we will consider Iris dataset and find correlation between the features of the dataset.

  • dataset = datasets.load_iris(): Loads Iris dataset from sklearn which contains data on flowers’ features like petal and sepal length/width.
  • dataframe[“target”] = dataset.target: Adds target column which contains the species of the iris flowers to the DataFrame.
Python
from sklearn import datasets  
import pandas as pd 
dataset = datasets. load_iris()  
dataframe = pd.DataFrame(data = dataset.data,columns = dataset.feature_names) 
dataframe["target"] = dataset.target  
matrix = dataframe.corr()
print(matrix)

Output:

correlation-2

Using IRIS dataset

By using libraries like NumPy and Pandas creating a correlation matrix in Python becomes easy and helps in understanding the hidden relationships between different variables in a dataset.

Related Articles:



Next Article

Similar Reads

  翻译: