SlideShare a Scribd company logo
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 1 of 12
 Covariance:
o Measures the linear relationship between two variables.
o It’s value is not very meaningful as it ranges from positive to negative infinity and presented in terms
of squared units (i.e. %2
, $2
)
 Correlation:
o Standardized measure of the linear relationship between two variables.
o Its value has no measurement unit and ranges from -1 (perfectly negatively correlated) to +1
(perfectly positively correlated).
o Limitations include the impact of outliers, potential for spurious correlation, and non-linear
relationships.
 Interpreting a scatter plot:
o A collection of points on a graph where each point represents the value of two variables.
o If correlation equals +1 the points lie exactly on an upward sloping line, the opposite is correct for
correlation equals -1.
 Hypothesis Testing for statistical significance:
o Test whether the correlation between the population of two variables is equal to zero,
(Two-tailed test with n-2 degrees of freedom at a given confidence level).
o Test structure:
o Test statistic: (assuming normal distribution)
o Decision rule:
o Interpretation:
If null cannot be rejected, we conclude that the correlation between variables X and Y is not
significantly different than zero at the given significance level (i.e 5%).
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 2 of 12
 Simple Linear Regression:
o Purpose: To explain the variation in a dependent variable in terms of the variation in a single
independent variable.
Dependent variable = explained, endogenous, or predicted variable.
Independent variable = explanatory, exogenous, predicting variable.
o Assumptions: (mostly related to residual –distrurbance or error- term (ε))
 A linear relationship exists between the dependent and the independent variable.
 The independent variable is uncorrelated with the residuals.
 The expected value of the residual term is zero.
 The variance of the residual term is constant for all observations. (otherwise, the data is
heteroskedastic)
 The residual term is independently distributed; that is, the residual for one observation is
not correlated with the residual of another. (otherwise, the data exhibits autocorrelation)
 The residual term is normally distributed.
o Model construction:
 The linear equation (regression line or line of best fit) is the line which minimizes the Sum of
Squared Errors (SSE), that’s why simple linear regression is often called Ordinary Least
Squares (OLS) regression and the estimated values are called least squares estimates.
 Slope coefficient: descibes the change in Y for a one unit change in X.
(stock’s β or systematic risk level, when X=market excess returns and Y=stock excess retuns)
 Inercept term: the line’s intersection with the Y axis (value of Y at X=0).
(ex-post α or excess risk-adjusted return relative to a market benchmark , when X=market
excess returns and Y=stock excess retuns)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 3 of 12
o Importance of the regression model in explaining the independent variable:
Requires determining the statistical significance of the regression (slope) coefficient through:
 Confidence Interval:
 Structure:
tc is the critical two tailed t-value for a given confidence level with n-2 df.
 Decision rule & interpretation:
If confidence interval doesn’t include zero, we can conclude that the slope
coefficient slope coefficient is significantly different from zero.
 Hypothesis Testing:
 Test structure:
 Test statistic: (assuming normal distribution)
 Decision rule:
 Interpretation:
If null cannot be rejected, we conclude that the slope coefficient is not significantly
different than the hypothesized value of b1 (zero in this case) at the given
significance level (i.e 5%).
 F-Test: to be discussed later at the end of this reading
o Standard Error of Estimate (SEE):
 Also known as standard error of the residual or standard error of the regression, measures
the degree of variability of the actual Y-values relative to the estimated Y-values from a
regression equation = σε.
 The higher the correlation, the smaller the Standard Error, the better the fit.
o Coefficient of determination (R2
):
 The percentage of the total variation in the dependent variable explained by the
independent variable.
 For simple linear regression, R2
= ρ2
o Confidence interval for predicted values:
 Structure:
tc is the critical two tailed t-value for a given confidence level with n-2 df.
sf is the standard error of forecast. (Calculating sf is highly improbable in the exam)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 4 of 12
 Interpretation:
Given a forecasted value of X, we can be (i.e. 95%) confident that Y will be between Y -
tc*sf and Y + tc*sf.
o Analysis of Variance (ANOVA):
Total variation = Explained variation + Unexplained Variation
 Total variation = Total Sum of Squares (SST) =
 Explained variation = Regression Sum of Squares (RSS) =
 Unexplained variation = Sum of Squared Errors (SSE) =
 If we denote the number of independent variables as k, then, regression df = k
= 1 for simple linear regression, error df = n-k-1 = n-2 for the same.
 MSR is the mean regression sum of squares and MSE is the mean squared error.
 R2
= Explained variation (RSS) / Total variation (SST)
 Standard Error of Estimate (SEE) =
 Variance of Y = SST / (n-1)
N
ot
For
R
elease
CORRELATION AND REGRESSION Quantitative Analysis
R-11 (SS-3)
Page 5 of 12
o The F-Statistic: (more useful with multiple regression)
Asses how well a set of independent variables, as a group, explains the variation in the
dependent variable with a desired level of significance. In other words, whether at least one of
the independent variables explains a siginificant portion of the variation of the dependent
variable. F-test is a one-tailed test.
 Test structure:
 Test statistic:
 Fc is the critical F-value at a given level of significance and the following df:
dfnumerator = k = 1
dfdenminator = n-k-1 = n-2
 Decision rule:
 Interpretation:
If null cannot be rejected, we conclude that the slope coefficient is not significantly
different than zero at the given significance level (i.e 5%).
o Limitations of regression analysis:
 Linear relationships can change over time (parameter instability)
 It’s usefulness is limited if other market participants are aware of and act on it.
 If the assumptions of the model don’t hold, the interpretation of the results will not be valid.
Major reasons for model invalidity include heteroskedasticity (non-constant variance of
error terms) and autocorrelation (error terms are not independent).
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 6 of 12
 Multiple regression is a regression analysis with more than one independent variable.
o Slope coefficient: descibes the change in Y for a one unit change in Xk holding other independent
valriables constant.
o Inercept term: the line’s intersection with the Y axis (value of Y at all Xs =0).
 Hypothesis Testing (t-tests).
 p-values: the smallest level of significance for which the null hypothesis can be rejected. So, if the p-value <
significance level, the null hypothesis can be rejected. Otherwise, the null hypothesis cannot be rejected.
 Confidence interval for a regression coefficient.
 If the independent variable is proved to be statistically insignificant (its coefficient is not different than zero
at a given confidence level), the whole model needs to be reestimated as the coefficients of other significant
variables will likely change.
 Assumptions: same as univariate regression in addition to that there is no exact linear relation between any
two or more independent variables. (otherwise, Multicollinearity)
 The F-Statistic
 R2
: the percentage of variation in the dependent variable that is collectively explained by all of the
independent variables.
o Multiple R: the correlation between actual and forecasted values of Y. Multiple R is the square root
of R2
. For simple regression, the correlation between the dependent and independent variables is
the same as multiple R with the same sign as slope coefficient.
 Adjusted R2
: R2
increases as more independent variables are added to the model, regardless of their
explanatory power, this problem is called overestimating the regression. To overcome this problem R2
should be adjusted for the number of independent variables as per the following formula:
o Adjusted R2
<= R2
o Adding a new variable to the model will increase R2
while it may increase or decrease adjusted R2
o Adjusted R2
may be less than zero if R2
is low enough
 Dummy variables:
o Usually used to quantify the impact of qualitative binary events (on or off). Dummy variables are
assigned values of 1 or 0 for on or off status.
o Whenever we need to distinguish between n classes we must use n-1 dummy variables. Otherwise,
the multiple regression assumption of no exact linear relationship between independent variables
would be violated.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 7 of 12
The omitted class should be thought of as the reference point which is represented by the intercept.
o Testing the statistical significance of the slope coefficients is equivalent to testing whether the value
of the dummy variable is equal to the omitted variable (intercept).
 Issues in regression analysis:
o Heteroskedasticity:
 Definition:
Occurs when the variance of the residuals is not the same across all observations. There are
two types:
 Unconditional Heteroskedasticity:
Not related to the level of the independent variables. Not a major problem.
 Conditional Heteroskedasticity:
Related to the level of the independent variables. Significant problem.
 Effect:
The standard errors are unreliable (affecting t-tests and F-test) while the coefficients are not
affected. Too small standard error is the main concern as it might lead to type I error, by
rejecting the null hypothesis of no significant coefficient.
 Detection:
 Examine the scatter plot of the residuals against the independent variables.
 Breusch-Pagan (BP) test by conducting a second regression, using the squared
residuals (from the 1st
regression) against the independent variables and test
whether the independent variable significantly contribute to the explanation of the
squared residuals.
The test statistic has a chi-square (χ2
) distribution with k degrees of freedom and
calculated as:
This is a one-tailed test as the concern is having too large
If test statistic > chi-square critical value ⟹ Reject the null hypothesis and
conclude that a conditional heteroskedasticity problem is present.
 Correction:
 Use robust standard errors (White-corrected standard errors or heteroskedasticity-
consistent standard errors) which are usually higher than the original standard
errors.
 Use generalized least squares, which modifies the original equation in an attempt to
eliminate heteroskedasticity.
o Serial Correlation (Autocorrelation):
 Definition:
Occurs when the residual terms are correlated with one another. It’s a common problem
with time series data. There are two types:
 Positive serial correlation:
When a positive error in one period increases the probability of observing a positive
error in the next period.
 Negative serial correlation:
When a positive error in one period increases the probability of observing a
negative error in the next period.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 8 of 12
 Effect:
The tendency of the data to cluster together underestimates the coefficient standard errors,
leading to type I errors.
 Detection:
 Examine the scatter plot of the residuals against time.
 Durbin-Watson (DW) statistic (the calculation is impractical for the exam)
If the sample size is very large ⟹ DW 2 (1-r), where r is the correlation coefficient
between residuals from one period and those from the previous period.
⟹ DW = 2 if r = 0 (homoscedastic data with no serial correlation)
⟹ DW > 2 if r < 0 (negative serial correlation)
⟹ DW < 2 if r > 0 (positive serial correlation)
For the Durbin-Watson test, there are upper and lower DW values depending on the
level of significance, number of observations, degrees of freedom (number of
variables k)
o Test Structure: H0: No positive serial correlation
o Decision Rule:
Reject H0 Inconclusive Fail to reject H0
0 DL DU
 Correction:
 Use Hansen method to provide Hansen-White standard errors, which also could be
used to correct for conditional heteroskedasticity. The general rule for use of
adjusted standard errors is:
o If the problem is serial correlation only ⟹ Hansen Method
o If the problem is conditional heteroskedasticity only ⟹ White-corrected
o If the problem is both ⟹ Hansen Method
 Improve the model specification, by including a seasonal term to reflect the time
series nature of the data. This can be tricky.
o Multicollinearity:
 Definition:
Occurs when linear combinations of independent variables are highly correlated. For k>2,
high correlation between individual independent variables (>0.7) suggests the possibility of
multicollinearity, but low correlation doesn’t necessarily indicate no multicollinearity.
N
ot
For
R
elease
MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis
R-12 (SS-3)
Page 9 of 12
 Effect:
Slope coefficients tend to be unreliable, and standard errors are artificially inflated. Hence,
there is a greater probability of Type II error.
 Detection:
While the F-test is statistically significant and R2
is high, the t-tests indicate no significance of
the individual coefficients.
 Correction:
Use statistical procedures, like stepwise regression, which systematically remove variables
from the regression until multicollinearity is minimized.
 Model misspecification:
o Categories:
I. The functional form can be misspecified:
1. Important variables are omitted.
2. Variables should be transformed.
(If the dependent is linearly related to the natural log of the variable or
standardizing B/S items by dividing by Total Assets or Sales for P&L and CF items.
Common mistakes include squaring or taking square root of the variable).
3. Data is improperly pooled.
(By pooling sub-periods that exhibit structural change).
II. Explanatory variables are correlated with the error terms in time series analysis:
1. A lagged dependent variable is used as an independent variable.
2. A function of the dependent variable is used as an independent variable
(“forecasting the past” i.e. use end of month market cap to predict returns during
the month).
3. Independent variables are measured with error.
(Use free float as a proxy for the corporate governance quality or actual inflation as
a proxy for expected inflation).
III. Other time-series misspecifications that result in nonstationarity.
o Effect:
Regression coefficients are often biased and/or inconsistent ⟹ unreliable hypothesis testing and
inaccurate predictions.
 Qualitative (dummy) dependent variables:
o Probit and logit models:
 Estimate the probability that the event occurs (i.e. probability of default or merger).
 The maximum likelihood is used to estimate coefficients.
 A probit model is based on normal distribution, while a logit model is based on a logistic
distribution.
o Discriminant models:
 Results in a linear function, similar to an ordinary regression, which generates an overall
score for an observation. The scores can then be used to rank or classify observations.
 Example: use financial ratios to get a score that places a company in a bankrupt or not
bankrupt class.
 Similar to probit and logit models but make different assumptions regarding the
independent variables.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 10 of 12
 A time series is a set of observations over successive periods of time.
 Linear trend model: (the data plot on a straight line)
 Log-linear trend model: (the data plot in a curve)
The model defines y as an exponential function of time. By taking the natural log of both sides, we transform
the equation from an exponential to a linear function.
For financial time series which display exponential growth, log-linear model provides a better fit for the data
and, thus, increases the model predictive power.
 When a variable grows at a constant rate (i.e. financial data and company sales), a log-linear model is most
appropriate. When it grows by a constant amount (i.e. inflation), a linear trend model is most appropriate.
 Limitation of trend models:
When time-series residuals exhibit serial correlation, as evident by DW test, we need to use autoregressive
(AR) model. This is done by regressing the dependent variable against one or more lagged values of itself, on
condition that the time-series being modeled is covariance stationary.
 Conditions for covariance stationary:
I. Constant and finite expected value (mean-reverting level).
II. Constant and finite variance (homoscedastic).
III. Constant and finite covariance between values at any given lag (the covariance of the time series
with leading or lagged values of itself is constant).
 An (AR) model of order p, AR(p) is expressed as:
(Where p is the number of lagged values included as independent variables)
 Forecasting with an autoregressive model:
Applying the chain rule of forecasting, it’s necessary to calculate a one-step-ahead forecast before a two-
step-ahead forecast. This implies that:
o Multi-period forecasts are more uncertain than single-period forecasts.
o Sample size = # of observations - # of (AR) order.
 Detection and correction for autocorrelation in autoregressive models:
The DW test used with trend models is not appropriate with AR models, instead, the following steps are
followed to detect autocorrelation and make sure the AR model is correctly specified:
I. Estimate the AR(1) model.
II. Calculate the autocorrelation of the model’s residuals.
III. t-test whether the autocorrelations are significantly different from zero. With df=T-2
√
⁄
, where T is the number of observations.
IV. If any of the autocorrelations is significantly different from zero, the model is not specified correctly.
Add more lags to the model and repeat step II until all autocorrelations are insignificant.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 11 of 12
 Mean reversion and random walk:
For a time-series to be covariance stationary it must have a constant and finite mean-reverting level, which
is the value the time-series tends to move to. Once this level is reached it’s expected that
⟹ mean-reverting level ⁄
For b1 = 1 (called unit root) the model doesn’t have a finite mean-reverting level, and thus, not covariance
stationary. This happens when the model follows random walk process which is classified to:
o Random walk without a drift:
o Random walk with a drift:
 Unit root detection:
As testing if b1=1 cannot be performed directly, use Dickey and Fuller (DF) test by transforming the AR(1)
model to run a simple regression by subtracting from both sides as follows:
⟹
Then test whether the transformed coefficient is different from zero using a modified t-test. With
H0 : = 0, if we fail to reject, we conclude that the series has a unit root.
 Unit root correction:
Use first differencing to transform the data to a covariance stationary time series for which . This is
done by constructing the following AR(1) model:
where and and
⟹ ⁄ ⁄ (finite value)
If the data has a linear trend, first difference the data. If the data has an exponential trend, first difference
the natural log of the data.
 Seasonality detection:
A pattern that tends to repeat from year to year. Not accounting for seasonality, when it’s present, will
make the AR model misspecified and unreliable for forecasting purposes.
Seasonality can be detected by observing that the residual autocorrelation for the month or quarter from
previous year (month 12 or quarter 4) is significantly different from zero.
 Seasonality correction:
Add an additional lag corresponding to the same period last year to the original model.
 In-sample forecasts are made within the range of data used to estimate the model. Out-of-sample forecasts
are made outside the sample period, to assess the predictive power of the model.
Given two models, to assess which one is better, apply root mean squared error (RMSE) criterion (the
square root of the average of the squared errors) on out-of-sample data. The model with the lowest RMSE is
the most accurate.
 As financial and economic environments are dynamic and frequently subject to structural shifts, there is a
tradeoff between the increased statistical reliability when using long time-series periods, and the increased
stability of the estimates when using shorter periods.
N
ot
For
R
elease
TIME-SERIES ANALYSIS Quantitative Analysis
R-13 (SS-3)
Page 12 of 12
 Autoregressive Conditional Heteroskedasticity (ARCH) model:
ARCH exists if the variance of the residuals in one period is dependent on the variance of the residuals in a
previous period. ARCH(1) model is expressed as:
If the coefficient a1 is statistically different from zero, it can be positive and the variance increases over time,
or negative and the variance decreases over time, indicating that error terms exhibit heteroskedasticity. In
either case, the time-series is ARCH(1) and, according to our need, we can either:
o Correct the model using procedures that correct for heteroskedasticity, such as generalized least
squares.
o Predict the variance of the residuals in future periods.
 Considerations for using two time-series variables in a linear regression:
Test for covariance stationary (by detecting the presence of autocorrelation or unit root) with the following
possibilities along with whether the data can be used or not:
1. Both time-series are covariance stationary ⟹ Yes
2. Only one variable is covariance stationary ⟹ No
3. Both time-series are not covariance stationary:
3.1. The two series are cointegrated ⟹ Yes
3.2. The two series are not cointegrated ⟹ No
 Cointergration:
Means that the two time-series are economically linked or follow the same trend and that relationship is not
expected to change.
To test for cointegration, regress one variable on the other using the following model:
The residuals are tested for a unit root using DF test with critical t-values calculated by Engle and Granger
(i.e. DF-EG test). If the test rejects null hypothesis of a unit root, we conclude that the error terms generated
by the two time series are covariance stationary and the two series are cointegrated.
N
ot
For
R
elease
Ad

More Related Content

What's hot (20)

Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problems
Soma Sinha Roy
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
somimemon
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
Transweb Global Inc
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
Allame Tabatabaei
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
Rabin BK
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
Sehrish Chaudary
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Shameer P Hamsa
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
naveedaliabad
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Teachers Mitraa
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Parminder Singh
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Sohag Babu
 
Linear regression
Linear regressionLinear regression
Linear regression
Learnbay Datascience
 
Simple Regression
Simple RegressionSimple Regression
Simple Regression
Khawaja Naveed
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
MuhamamdZiaSamad
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's Types
Mehul Boricha
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
abulyomon
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
sayantansarkar50
 
Solving stepwise regression problems
Solving stepwise regression problemsSolving stepwise regression problems
Solving stepwise regression problems
Soma Sinha Roy
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
somimemon
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
Transweb Global Inc
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
Rabin BK
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
naveedaliabad
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Sohag Babu
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
MuhamamdZiaSamad
 
Regression & It's Types
Regression & It's TypesRegression & It's Types
Regression & It's Types
Mehul Boricha
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
abulyomon
 

Similar to Quantitative Methods - Level II - CFA Program (20)

Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
Victoria Bozhenko
 
Regression-SIMPLE LINEAR (1).psssssssssptx
Regression-SIMPLE LINEAR (1).psssssssssptxRegression-SIMPLE LINEAR (1).psssssssssptx
Regression-SIMPLE LINEAR (1).psssssssssptx
pokah34509
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
Ulster BOCES
 
IDS.pdf
IDS.pdfIDS.pdf
IDS.pdf
SyedghaniCs669
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
Introduction to Regression - The Importance.pptx
Introduction to Regression - The Importance.pptxIntroduction to Regression - The Importance.pptx
Introduction to Regression - The Importance.pptx
gilbertlucero2
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
Gauravchaudhary214677
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
Kalahandi University
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
ayaan522797
 
Corelation and regression PowerPoint presentation for basic understanding
Corelation and regression PowerPoint presentation for basic understandingCorelation and regression PowerPoint presentation for basic understanding
Corelation and regression PowerPoint presentation for basic understanding
ZihadChowdhury1
 
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Noorien3
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
Lahore Garrison University
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
MokayceLimited
 
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
DrSoomro_2588_20292_1_Lecture 9 (1).pptxDrSoomro_2588_20292_1_Lecture 9 (1).pptx
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
mayaali60
 
Regression
RegressionRegression
Regression
ICFAI Business School
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
Dr. Tushar J Bhatt
 
Regression
RegressionRegression
Regression
Sauravurp
 
Regression_JAMOVI.pptx- Statistical data analysis
Regression_JAMOVI.pptx-  Statistical data analysisRegression_JAMOVI.pptx-  Statistical data analysis
Regression_JAMOVI.pptx- Statistical data analysis
divya1313
 
Regression-SIMPLE LINEAR (1).psssssssssptx
Regression-SIMPLE LINEAR (1).psssssssssptxRegression-SIMPLE LINEAR (1).psssssssssptx
Regression-SIMPLE LINEAR (1).psssssssssptx
pokah34509
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
Ulster BOCES
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
Introduction to Regression - The Importance.pptx
Introduction to Regression - The Importance.pptxIntroduction to Regression - The Importance.pptx
Introduction to Regression - The Importance.pptx
gilbertlucero2
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
Corelation and regression PowerPoint presentation for basic understanding
Corelation and regression PowerPoint presentation for basic understandingCorelation and regression PowerPoint presentation for basic understanding
Corelation and regression PowerPoint presentation for basic understanding
ZihadChowdhury1
 
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Noorien3
 
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
DrSoomro_2588_20292_1_Lecture 9 (1).pptxDrSoomro_2588_20292_1_Lecture 9 (1).pptx
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
mayaali60
 
Regression_JAMOVI.pptx- Statistical data analysis
Regression_JAMOVI.pptx-  Statistical data analysisRegression_JAMOVI.pptx-  Statistical data analysis
Regression_JAMOVI.pptx- Statistical data analysis
divya1313
 
Ad

Recently uploaded (20)

Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
Quiz Club of PSG College of Arts & Science
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
Aerospace Engineering Homework Help Guide – Expert Support for Academic Success
Aerospace Engineering Homework Help Guide – Expert Support for Academic SuccessAerospace Engineering Homework Help Guide – Expert Support for Academic Success
Aerospace Engineering Homework Help Guide – Expert Support for Academic Success
online college homework help
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
How to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale OrderHow to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale Order
Celine George
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
Conditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture SlideConditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture Slide
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
businessweekghana
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18
Celine George
 
Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
Aerospace Engineering Homework Help Guide – Expert Support for Academic Success
Aerospace Engineering Homework Help Guide – Expert Support for Academic SuccessAerospace Engineering Homework Help Guide – Expert Support for Academic Success
Aerospace Engineering Homework Help Guide – Expert Support for Academic Success
online college homework help
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
How to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale OrderHow to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale Order
Celine George
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
UPSA JUDGEMENT.pdfCopyright Infringement: High Court Rules against UPSA: A Wa...
businessweekghana
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18
Celine George
 
Ad

Quantitative Methods - Level II - CFA Program

  • 1. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 1 of 12  Covariance: o Measures the linear relationship between two variables. o It’s value is not very meaningful as it ranges from positive to negative infinity and presented in terms of squared units (i.e. %2 , $2 )  Correlation: o Standardized measure of the linear relationship between two variables. o Its value has no measurement unit and ranges from -1 (perfectly negatively correlated) to +1 (perfectly positively correlated). o Limitations include the impact of outliers, potential for spurious correlation, and non-linear relationships.  Interpreting a scatter plot: o A collection of points on a graph where each point represents the value of two variables. o If correlation equals +1 the points lie exactly on an upward sloping line, the opposite is correct for correlation equals -1.  Hypothesis Testing for statistical significance: o Test whether the correlation between the population of two variables is equal to zero, (Two-tailed test with n-2 degrees of freedom at a given confidence level). o Test structure: o Test statistic: (assuming normal distribution) o Decision rule: o Interpretation: If null cannot be rejected, we conclude that the correlation between variables X and Y is not significantly different than zero at the given significance level (i.e 5%). N ot For R elease
  • 2. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 2 of 12  Simple Linear Regression: o Purpose: To explain the variation in a dependent variable in terms of the variation in a single independent variable. Dependent variable = explained, endogenous, or predicted variable. Independent variable = explanatory, exogenous, predicting variable. o Assumptions: (mostly related to residual –distrurbance or error- term (ε))  A linear relationship exists between the dependent and the independent variable.  The independent variable is uncorrelated with the residuals.  The expected value of the residual term is zero.  The variance of the residual term is constant for all observations. (otherwise, the data is heteroskedastic)  The residual term is independently distributed; that is, the residual for one observation is not correlated with the residual of another. (otherwise, the data exhibits autocorrelation)  The residual term is normally distributed. o Model construction:  The linear equation (regression line or line of best fit) is the line which minimizes the Sum of Squared Errors (SSE), that’s why simple linear regression is often called Ordinary Least Squares (OLS) regression and the estimated values are called least squares estimates.  Slope coefficient: descibes the change in Y for a one unit change in X. (stock’s β or systematic risk level, when X=market excess returns and Y=stock excess retuns)  Inercept term: the line’s intersection with the Y axis (value of Y at X=0). (ex-post α or excess risk-adjusted return relative to a market benchmark , when X=market excess returns and Y=stock excess retuns) N ot For R elease
  • 3. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 3 of 12 o Importance of the regression model in explaining the independent variable: Requires determining the statistical significance of the regression (slope) coefficient through:  Confidence Interval:  Structure: tc is the critical two tailed t-value for a given confidence level with n-2 df.  Decision rule & interpretation: If confidence interval doesn’t include zero, we can conclude that the slope coefficient slope coefficient is significantly different from zero.  Hypothesis Testing:  Test structure:  Test statistic: (assuming normal distribution)  Decision rule:  Interpretation: If null cannot be rejected, we conclude that the slope coefficient is not significantly different than the hypothesized value of b1 (zero in this case) at the given significance level (i.e 5%).  F-Test: to be discussed later at the end of this reading o Standard Error of Estimate (SEE):  Also known as standard error of the residual or standard error of the regression, measures the degree of variability of the actual Y-values relative to the estimated Y-values from a regression equation = σε.  The higher the correlation, the smaller the Standard Error, the better the fit. o Coefficient of determination (R2 ):  The percentage of the total variation in the dependent variable explained by the independent variable.  For simple linear regression, R2 = ρ2 o Confidence interval for predicted values:  Structure: tc is the critical two tailed t-value for a given confidence level with n-2 df. sf is the standard error of forecast. (Calculating sf is highly improbable in the exam) N ot For R elease
  • 4. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 4 of 12  Interpretation: Given a forecasted value of X, we can be (i.e. 95%) confident that Y will be between Y - tc*sf and Y + tc*sf. o Analysis of Variance (ANOVA): Total variation = Explained variation + Unexplained Variation  Total variation = Total Sum of Squares (SST) =  Explained variation = Regression Sum of Squares (RSS) =  Unexplained variation = Sum of Squared Errors (SSE) =  If we denote the number of independent variables as k, then, regression df = k = 1 for simple linear regression, error df = n-k-1 = n-2 for the same.  MSR is the mean regression sum of squares and MSE is the mean squared error.  R2 = Explained variation (RSS) / Total variation (SST)  Standard Error of Estimate (SEE) =  Variance of Y = SST / (n-1) N ot For R elease
  • 5. CORRELATION AND REGRESSION Quantitative Analysis R-11 (SS-3) Page 5 of 12 o The F-Statistic: (more useful with multiple regression) Asses how well a set of independent variables, as a group, explains the variation in the dependent variable with a desired level of significance. In other words, whether at least one of the independent variables explains a siginificant portion of the variation of the dependent variable. F-test is a one-tailed test.  Test structure:  Test statistic:  Fc is the critical F-value at a given level of significance and the following df: dfnumerator = k = 1 dfdenminator = n-k-1 = n-2  Decision rule:  Interpretation: If null cannot be rejected, we conclude that the slope coefficient is not significantly different than zero at the given significance level (i.e 5%). o Limitations of regression analysis:  Linear relationships can change over time (parameter instability)  It’s usefulness is limited if other market participants are aware of and act on it.  If the assumptions of the model don’t hold, the interpretation of the results will not be valid. Major reasons for model invalidity include heteroskedasticity (non-constant variance of error terms) and autocorrelation (error terms are not independent). N ot For R elease
  • 6. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 6 of 12  Multiple regression is a regression analysis with more than one independent variable. o Slope coefficient: descibes the change in Y for a one unit change in Xk holding other independent valriables constant. o Inercept term: the line’s intersection with the Y axis (value of Y at all Xs =0).  Hypothesis Testing (t-tests).  p-values: the smallest level of significance for which the null hypothesis can be rejected. So, if the p-value < significance level, the null hypothesis can be rejected. Otherwise, the null hypothesis cannot be rejected.  Confidence interval for a regression coefficient.  If the independent variable is proved to be statistically insignificant (its coefficient is not different than zero at a given confidence level), the whole model needs to be reestimated as the coefficients of other significant variables will likely change.  Assumptions: same as univariate regression in addition to that there is no exact linear relation between any two or more independent variables. (otherwise, Multicollinearity)  The F-Statistic  R2 : the percentage of variation in the dependent variable that is collectively explained by all of the independent variables. o Multiple R: the correlation between actual and forecasted values of Y. Multiple R is the square root of R2 . For simple regression, the correlation between the dependent and independent variables is the same as multiple R with the same sign as slope coefficient.  Adjusted R2 : R2 increases as more independent variables are added to the model, regardless of their explanatory power, this problem is called overestimating the regression. To overcome this problem R2 should be adjusted for the number of independent variables as per the following formula: o Adjusted R2 <= R2 o Adding a new variable to the model will increase R2 while it may increase or decrease adjusted R2 o Adjusted R2 may be less than zero if R2 is low enough  Dummy variables: o Usually used to quantify the impact of qualitative binary events (on or off). Dummy variables are assigned values of 1 or 0 for on or off status. o Whenever we need to distinguish between n classes we must use n-1 dummy variables. Otherwise, the multiple regression assumption of no exact linear relationship between independent variables would be violated. N ot For R elease
  • 7. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 7 of 12 The omitted class should be thought of as the reference point which is represented by the intercept. o Testing the statistical significance of the slope coefficients is equivalent to testing whether the value of the dummy variable is equal to the omitted variable (intercept).  Issues in regression analysis: o Heteroskedasticity:  Definition: Occurs when the variance of the residuals is not the same across all observations. There are two types:  Unconditional Heteroskedasticity: Not related to the level of the independent variables. Not a major problem.  Conditional Heteroskedasticity: Related to the level of the independent variables. Significant problem.  Effect: The standard errors are unreliable (affecting t-tests and F-test) while the coefficients are not affected. Too small standard error is the main concern as it might lead to type I error, by rejecting the null hypothesis of no significant coefficient.  Detection:  Examine the scatter plot of the residuals against the independent variables.  Breusch-Pagan (BP) test by conducting a second regression, using the squared residuals (from the 1st regression) against the independent variables and test whether the independent variable significantly contribute to the explanation of the squared residuals. The test statistic has a chi-square (χ2 ) distribution with k degrees of freedom and calculated as: This is a one-tailed test as the concern is having too large If test statistic > chi-square critical value ⟹ Reject the null hypothesis and conclude that a conditional heteroskedasticity problem is present.  Correction:  Use robust standard errors (White-corrected standard errors or heteroskedasticity- consistent standard errors) which are usually higher than the original standard errors.  Use generalized least squares, which modifies the original equation in an attempt to eliminate heteroskedasticity. o Serial Correlation (Autocorrelation):  Definition: Occurs when the residual terms are correlated with one another. It’s a common problem with time series data. There are two types:  Positive serial correlation: When a positive error in one period increases the probability of observing a positive error in the next period.  Negative serial correlation: When a positive error in one period increases the probability of observing a negative error in the next period. N ot For R elease
  • 8. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 8 of 12  Effect: The tendency of the data to cluster together underestimates the coefficient standard errors, leading to type I errors.  Detection:  Examine the scatter plot of the residuals against time.  Durbin-Watson (DW) statistic (the calculation is impractical for the exam) If the sample size is very large ⟹ DW 2 (1-r), where r is the correlation coefficient between residuals from one period and those from the previous period. ⟹ DW = 2 if r = 0 (homoscedastic data with no serial correlation) ⟹ DW > 2 if r < 0 (negative serial correlation) ⟹ DW < 2 if r > 0 (positive serial correlation) For the Durbin-Watson test, there are upper and lower DW values depending on the level of significance, number of observations, degrees of freedom (number of variables k) o Test Structure: H0: No positive serial correlation o Decision Rule: Reject H0 Inconclusive Fail to reject H0 0 DL DU  Correction:  Use Hansen method to provide Hansen-White standard errors, which also could be used to correct for conditional heteroskedasticity. The general rule for use of adjusted standard errors is: o If the problem is serial correlation only ⟹ Hansen Method o If the problem is conditional heteroskedasticity only ⟹ White-corrected o If the problem is both ⟹ Hansen Method  Improve the model specification, by including a seasonal term to reflect the time series nature of the data. This can be tricky. o Multicollinearity:  Definition: Occurs when linear combinations of independent variables are highly correlated. For k>2, high correlation between individual independent variables (>0.7) suggests the possibility of multicollinearity, but low correlation doesn’t necessarily indicate no multicollinearity. N ot For R elease
  • 9. MULTIPLE REGRESSION & ISSUES IN REGRESSION ANALYSIS Quantitative Analysis R-12 (SS-3) Page 9 of 12  Effect: Slope coefficients tend to be unreliable, and standard errors are artificially inflated. Hence, there is a greater probability of Type II error.  Detection: While the F-test is statistically significant and R2 is high, the t-tests indicate no significance of the individual coefficients.  Correction: Use statistical procedures, like stepwise regression, which systematically remove variables from the regression until multicollinearity is minimized.  Model misspecification: o Categories: I. The functional form can be misspecified: 1. Important variables are omitted. 2. Variables should be transformed. (If the dependent is linearly related to the natural log of the variable or standardizing B/S items by dividing by Total Assets or Sales for P&L and CF items. Common mistakes include squaring or taking square root of the variable). 3. Data is improperly pooled. (By pooling sub-periods that exhibit structural change). II. Explanatory variables are correlated with the error terms in time series analysis: 1. A lagged dependent variable is used as an independent variable. 2. A function of the dependent variable is used as an independent variable (“forecasting the past” i.e. use end of month market cap to predict returns during the month). 3. Independent variables are measured with error. (Use free float as a proxy for the corporate governance quality or actual inflation as a proxy for expected inflation). III. Other time-series misspecifications that result in nonstationarity. o Effect: Regression coefficients are often biased and/or inconsistent ⟹ unreliable hypothesis testing and inaccurate predictions.  Qualitative (dummy) dependent variables: o Probit and logit models:  Estimate the probability that the event occurs (i.e. probability of default or merger).  The maximum likelihood is used to estimate coefficients.  A probit model is based on normal distribution, while a logit model is based on a logistic distribution. o Discriminant models:  Results in a linear function, similar to an ordinary regression, which generates an overall score for an observation. The scores can then be used to rank or classify observations.  Example: use financial ratios to get a score that places a company in a bankrupt or not bankrupt class.  Similar to probit and logit models but make different assumptions regarding the independent variables. N ot For R elease
  • 10. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 10 of 12  A time series is a set of observations over successive periods of time.  Linear trend model: (the data plot on a straight line)  Log-linear trend model: (the data plot in a curve) The model defines y as an exponential function of time. By taking the natural log of both sides, we transform the equation from an exponential to a linear function. For financial time series which display exponential growth, log-linear model provides a better fit for the data and, thus, increases the model predictive power.  When a variable grows at a constant rate (i.e. financial data and company sales), a log-linear model is most appropriate. When it grows by a constant amount (i.e. inflation), a linear trend model is most appropriate.  Limitation of trend models: When time-series residuals exhibit serial correlation, as evident by DW test, we need to use autoregressive (AR) model. This is done by regressing the dependent variable against one or more lagged values of itself, on condition that the time-series being modeled is covariance stationary.  Conditions for covariance stationary: I. Constant and finite expected value (mean-reverting level). II. Constant and finite variance (homoscedastic). III. Constant and finite covariance between values at any given lag (the covariance of the time series with leading or lagged values of itself is constant).  An (AR) model of order p, AR(p) is expressed as: (Where p is the number of lagged values included as independent variables)  Forecasting with an autoregressive model: Applying the chain rule of forecasting, it’s necessary to calculate a one-step-ahead forecast before a two- step-ahead forecast. This implies that: o Multi-period forecasts are more uncertain than single-period forecasts. o Sample size = # of observations - # of (AR) order.  Detection and correction for autocorrelation in autoregressive models: The DW test used with trend models is not appropriate with AR models, instead, the following steps are followed to detect autocorrelation and make sure the AR model is correctly specified: I. Estimate the AR(1) model. II. Calculate the autocorrelation of the model’s residuals. III. t-test whether the autocorrelations are significantly different from zero. With df=T-2 √ ⁄ , where T is the number of observations. IV. If any of the autocorrelations is significantly different from zero, the model is not specified correctly. Add more lags to the model and repeat step II until all autocorrelations are insignificant. N ot For R elease
  • 11. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 11 of 12  Mean reversion and random walk: For a time-series to be covariance stationary it must have a constant and finite mean-reverting level, which is the value the time-series tends to move to. Once this level is reached it’s expected that ⟹ mean-reverting level ⁄ For b1 = 1 (called unit root) the model doesn’t have a finite mean-reverting level, and thus, not covariance stationary. This happens when the model follows random walk process which is classified to: o Random walk without a drift: o Random walk with a drift:  Unit root detection: As testing if b1=1 cannot be performed directly, use Dickey and Fuller (DF) test by transforming the AR(1) model to run a simple regression by subtracting from both sides as follows: ⟹ Then test whether the transformed coefficient is different from zero using a modified t-test. With H0 : = 0, if we fail to reject, we conclude that the series has a unit root.  Unit root correction: Use first differencing to transform the data to a covariance stationary time series for which . This is done by constructing the following AR(1) model: where and and ⟹ ⁄ ⁄ (finite value) If the data has a linear trend, first difference the data. If the data has an exponential trend, first difference the natural log of the data.  Seasonality detection: A pattern that tends to repeat from year to year. Not accounting for seasonality, when it’s present, will make the AR model misspecified and unreliable for forecasting purposes. Seasonality can be detected by observing that the residual autocorrelation for the month or quarter from previous year (month 12 or quarter 4) is significantly different from zero.  Seasonality correction: Add an additional lag corresponding to the same period last year to the original model.  In-sample forecasts are made within the range of data used to estimate the model. Out-of-sample forecasts are made outside the sample period, to assess the predictive power of the model. Given two models, to assess which one is better, apply root mean squared error (RMSE) criterion (the square root of the average of the squared errors) on out-of-sample data. The model with the lowest RMSE is the most accurate.  As financial and economic environments are dynamic and frequently subject to structural shifts, there is a tradeoff between the increased statistical reliability when using long time-series periods, and the increased stability of the estimates when using shorter periods. N ot For R elease
  • 12. TIME-SERIES ANALYSIS Quantitative Analysis R-13 (SS-3) Page 12 of 12  Autoregressive Conditional Heteroskedasticity (ARCH) model: ARCH exists if the variance of the residuals in one period is dependent on the variance of the residuals in a previous period. ARCH(1) model is expressed as: If the coefficient a1 is statistically different from zero, it can be positive and the variance increases over time, or negative and the variance decreases over time, indicating that error terms exhibit heteroskedasticity. In either case, the time-series is ARCH(1) and, according to our need, we can either: o Correct the model using procedures that correct for heteroskedasticity, such as generalized least squares. o Predict the variance of the residuals in future periods.  Considerations for using two time-series variables in a linear regression: Test for covariance stationary (by detecting the presence of autocorrelation or unit root) with the following possibilities along with whether the data can be used or not: 1. Both time-series are covariance stationary ⟹ Yes 2. Only one variable is covariance stationary ⟹ No 3. Both time-series are not covariance stationary: 3.1. The two series are cointegrated ⟹ Yes 3.2. The two series are not cointegrated ⟹ No  Cointergration: Means that the two time-series are economically linked or follow the same trend and that relationship is not expected to change. To test for cointegration, regress one variable on the other using the following model: The residuals are tested for a unit root using DF test with critical t-values calculated by Engle and Granger (i.e. DF-EG test). If the test rejects null hypothesis of a unit root, we conclude that the error terms generated by the two time series are covariance stationary and the two series are cointegrated. N ot For R elease
  翻译: