R Squared | Coefficient of Determination
Last Updated :
25 Sep, 2024
R Squared | Coefficient of Determination: The R-squared is the statistical measure in the stream of regression analysis. In regression, we generally deal with the dependent and independent variables. A change in the independent variable is likely to cause a change in the dependent variable.
The R-squared coefficient represents the proportion of variation in the dependent variable (y) that is accounted for by the regression line, compared to the variation explained by the mean of y. Essentially, it measures how much more accurately the regression line predicts each point's value compared to simply using the average value of y.
In this article, we shall discuss R squared and its formula in detail. We will also learn about the interpretation of r squared, adjusted r squared, beta R squared, etc.
What is R-squared?
The R-squared formula or coefficient of determination is used to explain how much a dependent variable varies when the independent variable is varied. In other words, it explains the extent of variance of one variable concerning the other.
R-squared Meaning
R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by one or more independent variables in a regression model. In simpler terms, it shows how well the data fit a regression line or curve.
The coefficient of determination which is represented by R2 is determined using the following formula:
R2 = 1 – (RSS/TSS)
Where,
- R2 represents the requrired R Squared value,
- RSS represents the residual sum of squares, and
- TSS represents the total sum of squares.
If we are not provided with the residual sum of squares (RSS), it can be calculated as follows:
\bold{\mathrm{RSS} = \Sigma_{i=1}^n(y_i-\hat{y}_i)^2}
Where,
- yi is the ith observation, and
- \bold{\hat{y_i}}
is the estimated value of yi.
The coefficient of determination can also be calculated using another formula which is given by:
R2 = r2
Where r represents the correlation coefficient and is calculated using the following formula:
\bold{r = \frac{n\Sigma(xy)-\Sigma x \Sigma y}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}}
Where,
- n is the total observations,
- x is the first variable, and
- y is the second variable.
R-Squared Value Interpretation
The R-squared value tells us how good a regression model is in order to predict the value of the dependent variable. A 20% R squared value suggests that the dependent variable varies by 20% from the predicted value. Thus a higher value of R squared shows that 20% of the variability of the regression model is taken into account. A large value of R square is sometimes good but it may also show certain problems with our regression model. Similarly, a low value of R square may sometimes be also obtained in the case of well-fit regression models. Thus we need to consider other factors also when determining the variability of a regression model.
What is Adjusted R Squared?
As R squared formula takes into account only 2 variables. If more variables are to be added, then the value of R square never decreases but increases. Thus, we need to adjust the R square in order to compensate for the added variables. By adjusting the R-squared value the model becomes resistant to overfitting and underfitting. This is called Adjusted R Squared and the formula for it is discussed as follows:
Adjusted R-square formula is given as follows:
\bold{R^2_{adj}=1-\frac{(1-R^2)(N-1)}{N-p-1}}
Where,
- R2 is the Normal R square value,
- N is the Size of sample, and
- p is the no. of predictors.
Beta R-Square
R squared and adjusted R squared measures the variability of the value of a variable but beta R square is used to measure how large is the variation in the value of the variable.
R-Squared vs Adjusted R-Squared
The key differences between R-Squared and Adjusted R-Squared, are listed as follows:
Parameter | R-squared | Adjusted R-squared |
---|
Meaning | It considers all the independent variables to calculate the coefficient of determination for a dependent variable. | It considers only those independent variables that really affect the value of a dependent variable. |
---|
Use | It is used in case of simple linear regression | It is used in the case of linear as well as multiple regression. |
---|
Range of Value | Its value ranges from 0 to 1 and can't be negative | Its value depends upon the significance of independent variables and may be negative if the value of the R-square is very near to zero. |
---|
Advantages and Disadvantages of the R Squared Value
There are various advantages and disadvantages of the r-squared value, some of these advantages and disadvantages are listed as follows:
Advantages of the R Squared Value
The coefficient of determination or R square has the following advantages:
- It helps to predict the value of one variable according to the other.
- It helps to check the accuracy of making predictions from a given data model.
- It helps to predict the degree of association among various variables.
- The coefficient of determination lies in the range [0,1].
- If the coefficient of determination is 0, then the variables are independent, and the value of a variable cannot be predicted at all from the value of the second variable.
- If the coefficient of determination is 1, then the variables are completely dependent and the value of a variable can be accurately predicted from the value of the second variable.
- Any other value tells the extent of the determination of the value of the variable. The higher the value, the higher is the accuracy of the determination.
Disadvantages of the R Squared Value
The coefficient of determination has the following disadvantages:
- It does not tell the fitness of the model and does not consider the bias that the model may exhibit.
- It is also not useful to explain the reliability of the model.
- The value of R square can be low even for a very good model.
R Squared Solved Examples
Problem 1: Calculate the coefficient of determination from the following data:
Solution:
To calculate the coefficient of determination from above data we need to calculate ∑x, ∑y, ∑(xy), ∑x2, ∑y2, (∑x)2, (∑y)2.
X | Y | XY | X2 | Y2 |
---|
1.2 | 0 | 0 | 1.44 | 0 |
1 | 5 | 5 | 1 | 25 |
2 | 2 | 4 | 4 | 4 |
3 | 0 | 0 | 9 | 0 |
∑x = 7.2 | ∑y = 7 | ∑xy = 9 | ∑x 2 = 15.44 | ∑y 2 = 29 |
(∑x)2 = 51.84 and (∑y)2 = 49 and n = 4
Using r = \frac{n\Sigma(xy)-\Sigma x \Sigma y}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}
⇒ r = \frac{4(9)-(7.2*7)}{\sqrt{[4(15.44) - 51.84][4(29) - 49]}}
⇒ r = \frac{36-50.4}{\sqrt{[61.76 - 51.84][116 - 49]}}
⇒ r = \frac{-14.4}{\sqrt{[9.92][67]}}
⇒ r = \frac{-14.4}{\sqrt{664.64}}
⇒ r = \frac{-14.4}{25.78}
⇒ r = 0.558
Thus, R^2 = r^2 = (0.558)^2
⇒ R^2 = 0.3120 = 31.2 \%
Problem 2: Calculate the coefficient of determination from the following data:
Solution:
To calculate the coefficient of determination from above data we need to calculate ∑x, ∑y, ∑(xy), ∑x2, ∑y2, (∑x)2, (∑y)2.
X | Y | XY | X2 | Y2 |
---|
1 | 1 | 1 | 1 | 1 |
2 | 2 | 4 | 4 | 4 |
3 | 3 | 9 | 9 | 9 |
∑x = 6 | ∑y = 6 | ∑xy = 14 | ∑x2 = 14 | ∑y2 = 14 |
(∑x)2 = 36 and (∑y)2 = 36 and n = 3
Using r = \frac{n\Sigma(xy)-\Sigma x \Sigma y}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}
⇒ r = \frac{3(14)-(6*6)}{\sqrt{[3(14) - 36][3(14) - 36]}}
⇒ r = \frac{42-36}{\sqrt{[6][6]}}
⇒ r = \frac{6}{\sqrt{36}}
⇒ r = \frac{6}{6}
⇒ r = 1
Thus, R^2 = r^2 = (1)^2
⇒ R^2 = 1 = 100 \%
Problem 3: Calculate the coefficient of determination from the following data:
Solution:
To calculate the coefficient of determination from above data we need to calculate ∑x, ∑y, ∑(xy), ∑x2, ∑y2, (∑x)2, (∑y)2.
X | Y | XY | X2 | Y2 |
---|
1 | 1 | 1 | 1 | 1 |
2 | 4 | 8 | 4 | 16 |
3 | 6 | 18 | 9 | 36 |
∑x = 6 | ∑y = 11 | ∑xy = 27 | ∑x2 = 14 | ∑y2 = 53 |
(∑x)2 = 36 and (∑y)2 = 121 and n = 3
Using r = \frac{n\Sigma(xy)-\Sigma x \Sigma y}{\sqrt{[n\Sigma x^2 - (\Sigma x)^2][n\Sigma y^2 - (\Sigma y)^2]}}
⇒ r = \frac{3(27)-(6*11)}{\sqrt{[3(14) - 36][3(53) - 121]}}
⇒ r = \frac{81-66}{\sqrt{[42-36][159-121]}}
⇒ r = \frac{15}{\sqrt{6*38}}
⇒ r = \frac{15}{15.1}
⇒ r = 0.993
Thus, R^2 = r^2 = (0.993)^2
\Rightarrow R^2 = 0.9867 = 98.67 \%
Problem 4: Calculate the coefficient of determination if RSS = 1.5 and TSS = 1.9.
Solution:
Given RSS = 1.5, TSS = 1.9
Using R2 = 1 - (RSS/TSS)
⇒ R2 = 1 - (1.5/1.9)
⇒ R2 = 1-0.7894 ≈ 79%
⇒ R2 = 0.21 ≈ 21%
Problem 5: Calculate the coefficient of determination if RSS = 1.479 and TSS = 1.89734.
Solution:
Given RSS = 1.479, TSS = 1.89734
Using R2 = 1 - (RSS/TSS)
R2 = 1 - (1.479/1.89734)
⇒ R2 = 1 - 0.7795
⇒ R2 = 0.22 ≈ 22%
Similar Reads
Coefficient of Determination Formula
Coefficient of determination is defined as the fraction of variance predicted by the independent variable in the dependent variable. It shows the degree of variation in the data collection offered. It is also known as R2 method which is used to examine how differences in one variable may be explaine
4 min read
Adjusted Coefficient of Determination in R Programming
Prerequisite: Multiple Linear Regression using R A well-fitting regression model produces predicted values close to the observed data values. The mean model, which uses the mean for every predicted value, commonly would be used if there were no informative predictor variables. The fit of a proposed
3 min read
How to Calculate the Coefficient of Determination?
In mathematics, the study of data collection, analysis, perception, introduction, organization of data falls under statistics. In statistics, the coefficient of determination is utilized to notice how the contrast of one variable can be defined by the contrast of another variable. Like, whether a pe
7 min read
Correlation Coefficient Formula
Correlation Coefficient Formula: The correlation coefficient is a statistical measure used to quantify the relationship between predicted and observed values in a statistical analysis. It provides insight into the degree of precision between these predicted and actual values. Correlation coefficient
11 min read
Calculating Sum Of Squared Deviations In R
Sum of Squared Deviations (SSD) is a way to measure of how spread the data points are from the mean. It is used to determine the variability or spread in a dataset. In this article, we will discuss how to compute SSD both manually and in R. The SSD is found by taking how much each data point varies
3 min read
Linear Correlation Coefficient Formula
Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get a correlation coefficient, one of the most popular is Pearson's correlation (also known as Pearson's R) which is commonly used for linear regression. Pearson'
9 min read
Pearson Correlation Coefficient
Pearson Correlation Coefficient: Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get a correlation coefficient, one of the most popular is Pearson's correlation (also known as Pearson's r) which is commonly use
15+ min read
Coefficient of Performance Formula
Every appliance that people buy has some efficiency and based on the efficiency people prefer to buy it. One of the most important factors in determining the efficiency of air conditioners, refrigerators, and heat pumps is the coefficient of performance. When we go to the store to buy an air conditi
4 min read
Coefficient of Variation Formula
Coefficient of deviation in statistics is explained as the ratio of the standard deviation to the arithmetic mean, for instance, the expression standard deviation is 15 % of the arithmetic mean is the coefficient variation. In this article, we have covered the Coefficient of Variation Definition, it
9 min read
Coefficient of Thermal Conductivity
Coefficient of thermal conductivity measures the ability of a substance to transport heat. Wood and plastic are low conductivity materials that insulate and prevent heat movement, whereas metals and other high conductivity materials swiftly absorb and spread heat. Materials with high thermal conduct
9 min read