SlideShare a Scribd company logo
Introduction to Statistical Modeling
November 2 & 5, 2018
FANR 6750
Richard Chandler and Bob Cooper
Looking ahead
Linear models
Generalized linear models
Model selection and multi-model inference
Motivation Linear models Example Matrix notation 2 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 3 / 51
Motivation
Why do we need this part of the course?
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
• Often, however, it isn’t possible (or even desirable) to
control some aspects of the system being investigated
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
• Often, however, it isn’t possible (or even desirable) to
control some aspects of the system being investigated
• When manipulative experiments aren’t possible,
observational studies and predictive models can be the
next best option
Motivation Linear models Example Matrix notation 4 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Motivation Linear models Example Matrix notation 5 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Types of models
• Conceptual
• Mathematical
• Statistical
Motivation Linear models Example Matrix notation 5 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Types of models
• Conceptual
• Mathematical
• Statistical
Important point
“All models are wrong but some are useful” (George Box, 1976)
Motivation Linear models Example Matrix notation 5 / 51
Statistical models
What are they useful for?
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
• Evaulating hypotheses by confronting models with data
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
• Evaulating hypotheses by confronting models with data
• Predicting future outcomes
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
Two important pieces
(1) Deterministic component
Equation for the expected value of the response
variable
Motivation Linear models Example Matrix notation 7 / 51
Statistical models
Two important pieces
(1) Deterministic component
Equation for the expected value of the response
variable
(2) Stochastic component
Probability distribution describing the differences
between the expected values and the observed values
In parametric statistics, we assume we know the
distribution, but not the parameters of the
distribution
Motivation Linear models Example Matrix notation 7 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 8 / 51
Is this a linear model?
y = 20 + 0.5x
0 2 4 6 8 10
202122232425
x
y
Motivation Linear models Example Matrix notation 9 / 51
Is this a linear model?
y = 20 + 0.5x − 0.3x2
0 2 4 6 8 10
−505101520
x
y
Motivation Linear models Example Matrix notation 10 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
Motivation Linear models Example Matrix notation 11 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
This equation is often expressed in matrix notation as:
y = Xβ + ε
where X is a design matrix and β is a vector of coefficients.
Motivation Linear models Example Matrix notation 11 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
This equation is often expressed in matrix notation as:
y = Xβ + ε
where X is a design matrix and β is a vector of coefficients. More
on matrix notation later. . .
Motivation Linear models Example Matrix notation 11 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
A linear model might have dozens of continuous and categorical
predictors variables, with dozens of associated β coefficients.
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
A linear model might have dozens of continuous and categorical
predictors variables, with dozens of associated β coefficients.
Linear models can also include polynomial terms and interactions
between continuous and categorical predictors
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
The intercept β0 is the expected value of y, when all x’s are 0
Motivation Linear models Example Matrix notation 13 / 51
Interpretting the β’s
The intercept β0 is the expected value of y, when all x’s are 0
If x is a continuous explanatory variable:
• β can usually be interpretted as a slope parameter.
• In this case, β is the change in y resulting from a 1 unit
change in x (while holding the other predictors constant).
Motivation Linear models Example Matrix notation 13 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
In R, the default method for creating dummy variables from
unordered factors works like this:
• One level of the factor is treated as a reference level
• The reference level is associated with the intercept
• The β coefficients for the other levels of the factor are
differences from the reference level.
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
In R, the default method for creating dummy variables from
unordered factors works like this:
• One level of the factor is treated as a reference level
• The reference level is associated with the intercept
• The β coefficients for the other levels of the factor are
differences from the reference level.
The default method corresponds to:
options(contrasts=c("contr.treatment","contr.poly"))
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
Motivation Linear models Example Matrix notation 15 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
With this method:
• The β associated with each level of the factor is the difference
from the intercept
• The intercept can be interpetted as the grand mean if the
continuous variables have been centered
• One of the levels of the factor will not be displayed because it
is redundant when the intercept is estimated
Motivation Linear models Example Matrix notation 15 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
With this method:
• The β associated with each level of the factor is the difference
from the intercept
• The intercept can be interpetted as the grand mean if the
continuous variables have been centered
• One of the levels of the factor will not be displayed because it
is redundant when the intercept is estimated
This method corresponds to:
options(contrasts=c("contr.sum","contr.poly"))
Motivation Linear models Example Matrix notation 15 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 16 / 51
Example
The Island
Scrub-Jay
Example
Santa Cruz Island
Christy Beach
Main
Ranch
UC Field
Station
Scorpion
Anchorage
Prisoners’
Harbor
0 2.5 51.25
Kilometers
Elevation, meters
7480
Census Location
Los
Angeles
VenturaSanta Cruz
Island
Santa Barbara
Santa Cruz Data
Habitat data for all 2787 grid cells covering the island
head(cruz2)
## x y elevation forest chaparral habitat seeds
## 1 230736.7 3774324 241 0 0 Oak Low
## 2 231036.7 3774324 323 0 0 Pine Med
## 3 231336.7 3774324 277 0 0 Pine High
## 4 230436.7 3774024 13 0 0 Oak Med
## 5 230736.7 3774024 590 0 0 Oak High
## 6 231036.7 3774024 533 0 0 Oak Low
Motivation Linear models Example Matrix notation 19 / 51
Maps of predictor variables
Elevation
Easting
Northing
500
1000
1500
2000
Motivation Linear models Example Matrix notation 20 / 51
Maps of predictor variables
Forest Cover
0.0
0.2
0.4
0.6
0.8
1.0
Motivation Linear models Example Matrix notation 21 / 51
Questions
(1) How many jays are on the island?
(2) What environmental variables influence abundance?
(3) Can we predict consequences of environmental change?
Motivation Linear models Example Matrix notation 22 / 51
Maps of predictor variables
Chaparral and survey plots
0.0
0.2
0.4
0.6
0.8
1.0
Motivation Linear models Example Matrix notation 23 / 51
The (fake) jay data
head(jayData)
## x y elevation forest chaparral habitat seeds jays
## 2345 258636.7 3764124 423 0.00 0.02 Oak Med 34
## 740 261936.7 3769224 506 0.10 0.45 Oak Med 38
## 2304 246336.7 3764124 859 0.00 0.26 Oak High 40
## 2433 239436.7 3763524 1508 0.02 0.03 Pine Med 43
## 1104 239436.7 3767724 483 0.26 0.37 Oak Med 36
## 607 236436.7 3769524 830 0.00 0.01 Oak Low 39
Motivation Linear models Example Matrix notation 24 / 51
Simple linear regression
fm1 <- lm(jays ~ elevation, data=jayData)
summary(fm1)
##
## Call:
## lm(formula = jays ~ elevation, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4874 -1.7539 0.1566 1.6159 4.6155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.082808 0.453997 72.87 <2e-16 ***
## elevation 0.008337 0.000595 14.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.285 on 98 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.6636
## F-statistic: 196.3 on 1 and 98 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 25 / 51
Simple linear regression
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Motivation Linear models Example Matrix notation 26 / 51
Multiple linear regression
fm2 <- lm(jays ~ elevation+forest, data=jayData)
summary(fm2)
##
## Call:
## lm(formula = jays ~ elevation + forest, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4717 -1.7384 0.1552 1.5993 4.6319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.065994 0.467624 70.711 <2e-16 ***
## elevation 0.008337 0.000598 13.943 <2e-16 ***
## forest 0.294350 1.793079 0.164 0.87
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.296 on 97 degrees of freedom
## Multiple R-squared: 0.6671, Adjusted R-squared: 0.6603
## F-statistic: 97.21 on 2 and 97 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 27 / 51
Multiple linear regression
Forest cover
0.0
0.2
0.4
0.6
0.8
1.0
Elevation
0
500
1000
1500
2000
Expectednumberofjays
30
35
40
45
50
55
Motivation Linear models Example Matrix notation 28 / 51
One-way ANOVA
fm3 <- lm(jays ~ habitat, data=jayData)
summary(fm3)
##
## Call:
## lm(formula = jays ~ habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9143 -2.3684 -0.3684 3.0857 8.6316
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.875 1.356 26.456 <2e-16 ***
## habitatOak 3.493 1.448 2.413 0.0177 *
## habitatPine 2.039 1.503 1.357 0.1780
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.835 on 97 degrees of freedom
## Multiple R-squared: 0.07126, Adjusted R-squared: 0.05211
## F-statistic: 3.721 on 2 and 97 DF, p-value: 0.02773
Motivation Linear models Example Matrix notation 29 / 51
One-way ANOVA
Bare Oak Pine
Habitat
Expectednumberofjays
010203040
Motivation Linear models Example Matrix notation 30 / 51
ANCOVA
fm4 <- lm(jays ~ elevation+habitat, data=jayData)
summary(fm4)
##
## Call:
## lm(formula = jays ~ elevation + habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0327 -1.5356 0.0091 1.4686 4.2391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.072e+01 8.084e-01 37.997 < 2e-16 ***
## elevation 8.289e-03 5.414e-04 15.308 < 2e-16 ***
## habitatOak 3.166e+00 7.850e-01 4.034 0.00011 ***
## habitatPine 1.695e+00 8.148e-01 2.081 0.04010 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.078 on 96 degrees of freedom
## Multiple R-squared: 0.7301, Adjusted R-squared: 0.7217
## F-statistic: 86.56 on 3 and 96 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 31 / 51
ANCOVA
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Oak
Pine
Bare
Motivation Linear models Example Matrix notation 32 / 51
Continuous-categorical interaction
fm5 <- lm(jays ~ elevation*habitat, data=jayData)
summary(fm5)
##
## Call:
## lm(formula = jays ~ elevation * habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.008 -1.581 -0.103 1.420 4.184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.654383 1.446322 21.886 < 2e-16 ***
## elevation 0.006781 0.001999 3.393 0.00101 **
## habitatOak 2.428682 1.565227 1.552 0.12411
## habitatPine 0.399953 1.579874 0.253 0.80070
## elevation:habitatOak 0.001204 0.002153 0.559 0.57737
## elevation:habitatPine 0.002046 0.002151 0.951 0.34414
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.087 on 94 degrees of freedom
## Multiple R-squared: 0.7334, Adjusted R-squared: 0.7192
## F-statistic: 51.72 on 5 and 94 DF, p-value: < 2.2e-16Motivation Linear models Example Matrix notation 33 / 51
ANCOVA
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Oak
Pine
Bare
Motivation Linear models Example Matrix notation 34 / 51
Quadratic effect of elevation
fm6 <- lm(jays ~ elevation+I(elevation^2), data=jayData)
summary(fm6)
##
## Call:
## lm(formula = jays ~ elevation + I(elevation^2), data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8429 -1.4608 0.1304 1.5908 4.7854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.162e+01 7.631e-01 41.434 < 2e-16 ***
## elevation 1.368e-02 2.342e-03 5.843 6.86e-08 ***
## I(elevation^2) -3.542e-06 1.503e-06 -2.357 0.0204 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.233 on 97 degrees of freedom
## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6786
## F-statistic: 105.5 on 2 and 97 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 35 / 51
Quadratic effect of elevation
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Motivation Linear models Example Matrix notation 36 / 51
Interaction and quadratic effects
fm7 <- lm(jays ~ habitat * forest + elevation +
I(elevation^2), data=jayData)
summary(fm7)
##
## Call:
## lm(formula = jays ~ habitat * forest + elevation + I(elevation^2),
## data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2574 -1.4400 0.0487 1.4055 3.7924
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.920e+01 1.030e+00 28.338 < 2e-16 ***
## habitatOak 3.705e+00 8.433e-01 4.394 2.98e-05 ***
## habitatPine 2.216e+00 8.757e-01 2.531 0.0131 *
## forest 4.007e+01 2.780e+01 1.441 0.1529
## elevation 1.215e-02 2.300e-03 5.285 8.41e-07 ***
## I(elevation^2) -2.554e-06 1.484e-06 -1.721 0.0886 .
## habitatOak:forest -4.292e+01 2.785e+01 -1.541 0.1267
## habitatPine:forest -3.918e+01 2.784e+01 -1.407 0.1627
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.044 on 92 degrees of freedom
## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7307
## F-statistic: 39.37 on 7 and 92 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 37 / 51
Predict jay abundance at each grid cell
E7 <- predict(fm7, type="response", newdata=cruz2,
interval="confidence")
Motivation Linear models Example Matrix notation 38 / 51
Predict jay abundance at each grid cell
E7 <- predict(fm7, type="response", newdata=cruz2,
interval="confidence")
E7 <- cbind(cruz2[,c("x","y")], E7)
head(E7)
## x y fit lwr upr
## 1 230736.7 3774324 35.68349 34.86313 36.50386
## 2 231036.7 3774324 35.07284 34.22917 35.91652
## 3 231336.7 3774324 34.58427 33.72668 35.44186
## 4 230436.7 3774024 33.06042 31.55907 34.56177
## 5 230736.7 3774024 39.18440 38.49766 39.87113
## 6 231036.7 3774024 38.65512 37.98859 39.32165
Motivation Linear models Example Matrix notation 38 / 51
Map the predictions
Expected number of jays per grid cell
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 39 / 51
Map the predictions
Lower CI
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 40 / 51
Map the predictions
Upper CI
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 41 / 51
Future scenarios
What if pine and oak disapper?
Expected number of jays per grid cell
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 42 / 51
Future scenarios
What if pine and oak disapper?
Expected values
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 42 / 51
Future scenarios
What if sea level rises?
Motivation Linear models Example Matrix notation 43 / 51
Future scenarios
What if sea level rises?
Expected values
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 43 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 44 / 51
Linear model
All of the fixed effects models that we have covered can be
expressed this way:
y = Xβ + ε
where
ε ∼ Normal(0, σ2
)
Motivation Linear models Example Matrix notation 45 / 51
Linear model
All of the fixed effects models that we have covered can be
expressed this way:
y = Xβ + ε
where
ε ∼ Normal(0, σ2
)
Examples include
• Completely randomized ANOVA
• Randomized complete block designs with fixed block effects
• Factorial designs
• ANCOVA
Motivation Linear models Example Matrix notation 45 / 51
Then how do they differ?
• The design matrices are different
• And so are the number of parameters (coefficients) to be
estimated
• Important to understand how to construct design matrix
that includes categorical variables
Motivation Linear models Example Matrix notation 46 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
In R, the design matrix is created internally based on the formula
that you provide
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
In R, the design matrix is created internally based on the formula
that you provide
The design matrix can be viewed using the model.matrix
function
Motivation Linear models Example Matrix notation 47 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Motivation Linear models Example Matrix notation 48 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Design matrix
X1 <- model.matrix(~age,
data=dietData)
head(X1, n=10)
## (Intercept) age
## 1 1 11.622260
## 2 1 13.555397
## 3 1 15.357372
## 4 1 7.950214
## 5 1 5.493861
## 6 1 18.874970
## 7 1 12.811297
## 8 1 17.402436
## 9 1 7.379666
## 10 1 18.611817
Motivation Linear models Example Matrix notation 48 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Design matrix
X1 <- model.matrix(~age,
data=dietData)
head(X1, n=10)
## (Intercept) age
## 1 1 11.622260
## 2 1 13.555397
## 3 1 15.357372
## 4 1 7.950214
## 5 1 5.493861
## 6 1 18.874970
## 7 1 12.811297
## 8 1 17.402436
## 9 1 7.379666
## 10 1 18.611817
How do we multiply this design matrix (X) by the vector of
regression coefficients (β)?
Motivation Linear models Example Matrix notation 48 / 51
Matrix multiplication
E(y) = Xβ
Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ
=


a b c d
e f g h
i j k l

 ×




w
x
y
z




Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ


aw + bx + cy + dz
ew + fx + gy + hz
iw + jx + ky + lz

 =


a b c d
e f g h
i j k l

 ×




w
x
y
z




Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ


aw + bx + cy + dz
ew + fx + gy + hz
iw + jx + ky + lz

 =


a b c d
e f g h
i j k l

 ×




w
x
y
z




In this example
• The first matrix corresponds to the expected values of y
• The second matrix corresponds to the design matrix X
• The third matrix (a column vector) corresponds to β
Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
Motivation Linear models Example Matrix notation 50 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
E(y) = Xβ or yi = β0 + β1xi
Motivation Linear models Example Matrix notation 50 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
E(y) = Xβ or yi = β0 + β1xi
Ey1 <- X1 %*% beta
head(Ey1, 5)
## [,1]
## 1 27.34634
## 2 28.34784
## 3 29.28138
## 4 25.44398
## 5 24.17142
Motivation Linear models Example Matrix notation 50 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
Motivation Linear models Example Matrix notation 51 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
They can be used to model a wide array of biological processes, and
they can be easily extended when their assumptions do not hold
Motivation Linear models Example Matrix notation 51 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
They can be used to model a wide array of biological processes, and
they can be easily extended when their assumptions do not hold
One of the most important extensions is to cases where the
residuals are not normally distributed. Generalized linear models
address this issue.
Motivation Linear models Example Matrix notation 51 / 51
Ad

More Related Content

What's hot (20)

6. R data structures
6. R data structures6. R data structures
6. R data structures
ExternalEvents
 
Frequency Distributions for Organizing and Summarizing
Frequency Distributions for Organizing and Summarizing Frequency Distributions for Organizing and Summarizing
Frequency Distributions for Organizing and Summarizing
Long Beach City College
 
Median
MedianMedian
Median
MahrukhShehzadi1
 
Echelon and reduced echelon form & Filters
Echelon and reduced echelon form & FiltersEchelon and reduced echelon form & Filters
Echelon and reduced echelon form & Filters
Zahid Ali
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
CamilleJoy3
 
Probability
ProbabilityProbability
Probability
mrraymondstats
 
Probability Theory
Probability Theory Probability Theory
Probability Theory
Anthony J. Evans
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
Shariful Haque Robin
 
21EC33 BSP Module 1.pdf
21EC33 BSP Module 1.pdf21EC33 BSP Module 1.pdf
21EC33 BSP Module 1.pdf
Ravikiran A
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
Dr Nisha Arora
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
yatin bhardwaj
 
Vector space
Vector spaceVector space
Vector space
Jaimin Patel
 
Cumulative frequency
Cumulative frequencyCumulative frequency
Cumulative frequency
MrGarvey
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
Ajendra Sharma
 
Matrix algebra
Matrix algebraMatrix algebra
Matrix algebra
Arjuna Senanayake
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
Tish997
 
Probability in daily life
Probability in daily lifeProbability in daily life
Probability in daily life
Choudhary Abdullah
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
ScholarsPoint1
 
Probability And Its Axioms
Probability And Its AxiomsProbability And Its Axioms
Probability And Its Axioms
mathscontent
 
Frequency Distributions for Organizing and Summarizing
Frequency Distributions for Organizing and Summarizing Frequency Distributions for Organizing and Summarizing
Frequency Distributions for Organizing and Summarizing
Long Beach City College
 
Echelon and reduced echelon form & Filters
Echelon and reduced echelon form & FiltersEchelon and reduced echelon form & Filters
Echelon and reduced echelon form & Filters
Zahid Ali
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
CamilleJoy3
 
21EC33 BSP Module 1.pdf
21EC33 BSP Module 1.pdf21EC33 BSP Module 1.pdf
21EC33 BSP Module 1.pdf
Ravikiran A
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
Dr Nisha Arora
 
Cumulative frequency
Cumulative frequencyCumulative frequency
Cumulative frequency
MrGarvey
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
Ajendra Sharma
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
Tish997
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
ScholarsPoint1
 
Probability And Its Axioms
Probability And Its AxiomsProbability And Its Axioms
Probability And Its Axioms
mathscontent
 

Similar to Introduction to statistical modeling in R (20)

Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Elkana Rorio
 
10. SEM Models_JASP.pptx00000000000000000
10. SEM Models_JASP.pptx0000000000000000010. SEM Models_JASP.pptx00000000000000000
10. SEM Models_JASP.pptx00000000000000000
GeethaSaranya4
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
DataminingTools Inc
 
Matlab: Regression
Matlab: RegressionMatlab: Regression
Matlab: Regression
matlab Content
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
An Introduction to Regression Models: Linear and Logistic approaches
An Introduction to Regression Models: Linear and Logistic approachesAn Introduction to Regression Models: Linear and Logistic approaches
An Introduction to Regression Models: Linear and Logistic approaches
Bhanu Yadav
 
Structural equation modeling BY Abdul Rahim Chandio
Structural equation  modeling BY  Abdul Rahim ChandioStructural equation  modeling BY  Abdul Rahim Chandio
Structural equation modeling BY Abdul Rahim Chandio
AbdulRahimChandio1
 
Topic 1.4
Topic 1.4Topic 1.4
Topic 1.4
Sue Whale
 
Exploratory
Exploratory Exploratory
Exploratory
toby2036
 
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdfSET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
dhruvkeshav123
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
aneeshs28
 
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docxRubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
daniely50
 
Linear Regression final-1.pptx thbejnnej
Linear Regression final-1.pptx thbejnnejLinear Regression final-1.pptx thbejnnej
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
Lecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear RegressionLecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear Regression
viyah59114
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
Dr Nisha Arora
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
Bong-Ho Lee
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
domsr
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
domsr
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
Daria Bogdanova
 
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptxLec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
au1417257
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Elkana Rorio
 
10. SEM Models_JASP.pptx00000000000000000
10. SEM Models_JASP.pptx0000000000000000010. SEM Models_JASP.pptx00000000000000000
10. SEM Models_JASP.pptx00000000000000000
GeethaSaranya4
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
An Introduction to Regression Models: Linear and Logistic approaches
An Introduction to Regression Models: Linear and Logistic approachesAn Introduction to Regression Models: Linear and Logistic approaches
An Introduction to Regression Models: Linear and Logistic approaches
Bhanu Yadav
 
Structural equation modeling BY Abdul Rahim Chandio
Structural equation  modeling BY  Abdul Rahim ChandioStructural equation  modeling BY  Abdul Rahim Chandio
Structural equation modeling BY Abdul Rahim Chandio
AbdulRahimChandio1
 
Exploratory
Exploratory Exploratory
Exploratory
toby2036
 
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdfSET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI3001_Neural%20Networks.pdf
dhruvkeshav123
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
aneeshs28
 
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docxRubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
daniely50
 
Linear Regression final-1.pptx thbejnnej
Linear Regression final-1.pptx thbejnnejLinear Regression final-1.pptx thbejnnej
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
Lecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear RegressionLecture 5 - Linear Regression Linear Regression
Lecture 5 - Linear Regression Linear Regression
viyah59114
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
Dr Nisha Arora
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
Bong-Ho Lee
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
domsr
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
domsr
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
Daria Bogdanova
 
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptxLec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
Lec4(Multiple Regression) & Building a Model & Dummy Variable.pptx
au1417257
 
Ad

More from richardchandler (17)

Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inference
richardchandler
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
richardchandler
 
ANCOVA in R
ANCOVA in RANCOVA in R
ANCOVA in R
richardchandler
 
Repeated measures analysis in R
Repeated measures analysis in RRepeated measures analysis in R
Repeated measures analysis in R
richardchandler
 
Split-plot Designs
Split-plot DesignsSplit-plot Designs
Split-plot Designs
richardchandler
 
Nested Designs
Nested DesignsNested Designs
Nested Designs
richardchandler
 
Factorial designs
Factorial designsFactorial designs
Factorial designs
richardchandler
 
Blocking lab
Blocking labBlocking lab
Blocking lab
richardchandler
 
Assumptions of ANOVA
Assumptions of ANOVAAssumptions of ANOVA
Assumptions of ANOVA
richardchandler
 
Lab on contrasts, estimation, and power
Lab on contrasts, estimation, and powerLab on contrasts, estimation, and power
Lab on contrasts, estimation, and power
richardchandler
 
One-way ANOVA
One-way ANOVAOne-way ANOVA
One-way ANOVA
richardchandler
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
richardchandler
 
Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750
richardchandler
 
Hierarchichal species distributions model and Maxent
Hierarchichal species distributions model and MaxentHierarchichal species distributions model and Maxent
Hierarchichal species distributions model and Maxent
richardchandler
 
Slides from ESA 2015
Slides from ESA 2015Slides from ESA 2015
Slides from ESA 2015
richardchandler
 
The role of spatial models in applied ecological research
The role of spatial models in applied ecological researchThe role of spatial models in applied ecological research
The role of spatial models in applied ecological research
richardchandler
 
2014 ISEC slides
2014 ISEC slides2014 ISEC slides
2014 ISEC slides
richardchandler
 
Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inference
richardchandler
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
richardchandler
 
Repeated measures analysis in R
Repeated measures analysis in RRepeated measures analysis in R
Repeated measures analysis in R
richardchandler
 
Lab on contrasts, estimation, and power
Lab on contrasts, estimation, and powerLab on contrasts, estimation, and power
Lab on contrasts, estimation, and power
richardchandler
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
richardchandler
 
Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750
richardchandler
 
Hierarchichal species distributions model and Maxent
Hierarchichal species distributions model and MaxentHierarchichal species distributions model and Maxent
Hierarchichal species distributions model and Maxent
richardchandler
 
The role of spatial models in applied ecological research
The role of spatial models in applied ecological researchThe role of spatial models in applied ecological research
The role of spatial models in applied ecological research
richardchandler
 
Ad

Recently uploaded (20)

Preparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptxPreparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptx
klynct
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATIONBIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
KrishnaShastri7
 
Freshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and FactorsFreshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and Factors
mytriplemonlineshop
 
Green Synthesis of Gold Nanoparticles.pptx
Green Synthesis of Gold Nanoparticles.pptxGreen Synthesis of Gold Nanoparticles.pptx
Green Synthesis of Gold Nanoparticles.pptx
Torskal Nanoscience
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Secondary metabolite ,Plants and Health Care
Secondary metabolite ,Plants and Health CareSecondary metabolite ,Plants and Health Care
Secondary metabolite ,Plants and Health Care
Nistarini College, Purulia (W.B) India
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
Fatigue and its management in aviation medicine
Fatigue and its management in aviation medicineFatigue and its management in aviation medicine
Fatigue and its management in aviation medicine
ImranJewel2
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Freud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise PsicFreud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise Psic
StefannyGoffi1
 
Somato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptxSomato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptx
klynct
 
Approach to Upper GASTRO INTESTINAL Bleed.pptx
Approach to Upper GASTRO INTESTINAL Bleed.pptxApproach to Upper GASTRO INTESTINAL Bleed.pptx
Approach to Upper GASTRO INTESTINAL Bleed.pptx
PrabakaranNatarajan10
 
Electroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_AignificancrElectroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_Aignificancr
klynct
 
Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Funakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalogFunakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalog
fu7koshi
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 
Preparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptxPreparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptx
klynct
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATIONBIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
BIODIESEL AND ROLEs OF GLYCEROL PRODUCTION AND APPLICATION
KrishnaShastri7
 
Freshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and FactorsFreshwater Biome Types, Characteristics and Factors
Freshwater Biome Types, Characteristics and Factors
mytriplemonlineshop
 
Green Synthesis of Gold Nanoparticles.pptx
Green Synthesis of Gold Nanoparticles.pptxGreen Synthesis of Gold Nanoparticles.pptx
Green Synthesis of Gold Nanoparticles.pptx
Torskal Nanoscience
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
Fatigue and its management in aviation medicine
Fatigue and its management in aviation medicineFatigue and its management in aviation medicine
Fatigue and its management in aviation medicine
ImranJewel2
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Freud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise PsicFreud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise Psic
StefannyGoffi1
 
Somato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptxSomato_Sensory _ somatomotor_Nervous_System.pptx
Somato_Sensory _ somatomotor_Nervous_System.pptx
klynct
 
Approach to Upper GASTRO INTESTINAL Bleed.pptx
Approach to Upper GASTRO INTESTINAL Bleed.pptxApproach to Upper GASTRO INTESTINAL Bleed.pptx
Approach to Upper GASTRO INTESTINAL Bleed.pptx
PrabakaranNatarajan10
 
Electroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_AignificancrElectroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_Aignificancr
klynct
 
Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Funakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalogFunakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalog
fu7koshi
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 

Introduction to statistical modeling in R

  • 1. Introduction to Statistical Modeling November 2 & 5, 2018 FANR 6750 Richard Chandler and Bob Cooper
  • 2. Looking ahead Linear models Generalized linear models Model selection and multi-model inference Motivation Linear models Example Matrix notation 2 / 51
  • 3. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 3 / 51
  • 4. Motivation Why do we need this part of the course? Motivation Linear models Example Matrix notation 4 / 51
  • 5. Motivation Why do we need this part of the course? • We have been modeling all along Motivation Linear models Example Matrix notation 4 / 51
  • 6. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference Motivation Linear models Example Matrix notation 4 / 51
  • 7. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference • Often, however, it isn’t possible (or even desirable) to control some aspects of the system being investigated Motivation Linear models Example Matrix notation 4 / 51
  • 8. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference • Often, however, it isn’t possible (or even desirable) to control some aspects of the system being investigated • When manipulative experiments aren’t possible, observational studies and predictive models can be the next best option Motivation Linear models Example Matrix notation 4 / 51
  • 9. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Motivation Linear models Example Matrix notation 5 / 51
  • 10. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Types of models • Conceptual • Mathematical • Statistical Motivation Linear models Example Matrix notation 5 / 51
  • 11. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Types of models • Conceptual • Mathematical • Statistical Important point “All models are wrong but some are useful” (George Box, 1976) Motivation Linear models Example Matrix notation 5 / 51
  • 12. Statistical models What are they useful for? Motivation Linear models Example Matrix notation 6 / 51
  • 13. Statistical models What are they useful for? • Formalizing hypotheses using math and probability Motivation Linear models Example Matrix notation 6 / 51
  • 14. Statistical models What are they useful for? • Formalizing hypotheses using math and probability • Evaulating hypotheses by confronting models with data Motivation Linear models Example Matrix notation 6 / 51
  • 15. Statistical models What are they useful for? • Formalizing hypotheses using math and probability • Evaulating hypotheses by confronting models with data • Predicting future outcomes Motivation Linear models Example Matrix notation 6 / 51
  • 16. Statistical models Two important pieces (1) Deterministic component Equation for the expected value of the response variable Motivation Linear models Example Matrix notation 7 / 51
  • 17. Statistical models Two important pieces (1) Deterministic component Equation for the expected value of the response variable (2) Stochastic component Probability distribution describing the differences between the expected values and the observed values In parametric statistics, we assume we know the distribution, but not the parameters of the distribution Motivation Linear models Example Matrix notation 7 / 51
  • 18. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 8 / 51
  • 19. Is this a linear model? y = 20 + 0.5x 0 2 4 6 8 10 202122232425 x y Motivation Linear models Example Matrix notation 9 / 51
  • 20. Is this a linear model? y = 20 + 0.5x − 0.3x2 0 2 4 6 8 10 −505101520 x y Motivation Linear models Example Matrix notation 10 / 51
  • 21. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). Motivation Linear models Example Matrix notation 11 / 51
  • 22. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). This equation is often expressed in matrix notation as: y = Xβ + ε where X is a design matrix and β is a vector of coefficients. Motivation Linear models Example Matrix notation 11 / 51
  • 23. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). This equation is often expressed in matrix notation as: y = Xβ + ε where X is a design matrix and β is a vector of coefficients. More on matrix notation later. . . Motivation Linear models Example Matrix notation 11 / 51
  • 24. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. Motivation Linear models Example Matrix notation 12 / 51
  • 25. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. A linear model might have dozens of continuous and categorical predictors variables, with dozens of associated β coefficients. Motivation Linear models Example Matrix notation 12 / 51
  • 26. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. A linear model might have dozens of continuous and categorical predictors variables, with dozens of associated β coefficients. Linear models can also include polynomial terms and interactions between continuous and categorical predictors Motivation Linear models Example Matrix notation 12 / 51
  • 27. Interpretting the β’s The intercept β0 is the expected value of y, when all x’s are 0 Motivation Linear models Example Matrix notation 13 / 51
  • 28. Interpretting the β’s The intercept β0 is the expected value of y, when all x’s are 0 If x is a continuous explanatory variable: • β can usually be interpretted as a slope parameter. • In this case, β is the change in y resulting from a 1 unit change in x (while holding the other predictors constant). Motivation Linear models Example Matrix notation 13 / 51
  • 29. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables Motivation Linear models Example Matrix notation 14 / 51
  • 30. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables Motivation Linear models Example Matrix notation 14 / 51
  • 31. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables In R, the default method for creating dummy variables from unordered factors works like this: • One level of the factor is treated as a reference level • The reference level is associated with the intercept • The β coefficients for the other levels of the factor are differences from the reference level. Motivation Linear models Example Matrix notation 14 / 51
  • 32. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables In R, the default method for creating dummy variables from unordered factors works like this: • One level of the factor is treated as a reference level • The reference level is associated with the intercept • The β coefficients for the other levels of the factor are differences from the reference level. The default method corresponds to: options(contrasts=c("contr.treatment","contr.poly")) Motivation Linear models Example Matrix notation 14 / 51
  • 33. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. Motivation Linear models Example Matrix notation 15 / 51
  • 34. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. With this method: • The β associated with each level of the factor is the difference from the intercept • The intercept can be interpetted as the grand mean if the continuous variables have been centered • One of the levels of the factor will not be displayed because it is redundant when the intercept is estimated Motivation Linear models Example Matrix notation 15 / 51
  • 35. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. With this method: • The β associated with each level of the factor is the difference from the intercept • The intercept can be interpetted as the grand mean if the continuous variables have been centered • One of the levels of the factor will not be displayed because it is redundant when the intercept is estimated This method corresponds to: options(contrasts=c("contr.sum","contr.poly")) Motivation Linear models Example Matrix notation 15 / 51
  • 36. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 16 / 51
  • 38. Example Santa Cruz Island Christy Beach Main Ranch UC Field Station Scorpion Anchorage Prisoners’ Harbor 0 2.5 51.25 Kilometers Elevation, meters 7480 Census Location Los Angeles VenturaSanta Cruz Island Santa Barbara
  • 39. Santa Cruz Data Habitat data for all 2787 grid cells covering the island head(cruz2) ## x y elevation forest chaparral habitat seeds ## 1 230736.7 3774324 241 0 0 Oak Low ## 2 231036.7 3774324 323 0 0 Pine Med ## 3 231336.7 3774324 277 0 0 Pine High ## 4 230436.7 3774024 13 0 0 Oak Med ## 5 230736.7 3774024 590 0 0 Oak High ## 6 231036.7 3774024 533 0 0 Oak Low Motivation Linear models Example Matrix notation 19 / 51
  • 40. Maps of predictor variables Elevation Easting Northing 500 1000 1500 2000 Motivation Linear models Example Matrix notation 20 / 51
  • 41. Maps of predictor variables Forest Cover 0.0 0.2 0.4 0.6 0.8 1.0 Motivation Linear models Example Matrix notation 21 / 51
  • 42. Questions (1) How many jays are on the island? (2) What environmental variables influence abundance? (3) Can we predict consequences of environmental change? Motivation Linear models Example Matrix notation 22 / 51
  • 43. Maps of predictor variables Chaparral and survey plots 0.0 0.2 0.4 0.6 0.8 1.0 Motivation Linear models Example Matrix notation 23 / 51
  • 44. The (fake) jay data head(jayData) ## x y elevation forest chaparral habitat seeds jays ## 2345 258636.7 3764124 423 0.00 0.02 Oak Med 34 ## 740 261936.7 3769224 506 0.10 0.45 Oak Med 38 ## 2304 246336.7 3764124 859 0.00 0.26 Oak High 40 ## 2433 239436.7 3763524 1508 0.02 0.03 Pine Med 43 ## 1104 239436.7 3767724 483 0.26 0.37 Oak Med 36 ## 607 236436.7 3769524 830 0.00 0.01 Oak Low 39 Motivation Linear models Example Matrix notation 24 / 51
  • 45. Simple linear regression fm1 <- lm(jays ~ elevation, data=jayData) summary(fm1) ## ## Call: ## lm(formula = jays ~ elevation, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.4874 -1.7539 0.1566 1.6159 4.6155 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 33.082808 0.453997 72.87 <2e-16 *** ## elevation 0.008337 0.000595 14.01 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.285 on 98 degrees of freedom ## Multiple R-squared: 0.667, Adjusted R-squared: 0.6636 ## F-statistic: 196.3 on 1 and 98 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 25 / 51
  • 46. Simple linear regression q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 500 1000 1500 30354045 Elevation (m) Jays Motivation Linear models Example Matrix notation 26 / 51
  • 47. Multiple linear regression fm2 <- lm(jays ~ elevation+forest, data=jayData) summary(fm2) ## ## Call: ## lm(formula = jays ~ elevation + forest, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.4717 -1.7384 0.1552 1.5993 4.6319 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 33.065994 0.467624 70.711 <2e-16 *** ## elevation 0.008337 0.000598 13.943 <2e-16 *** ## forest 0.294350 1.793079 0.164 0.87 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.296 on 97 degrees of freedom ## Multiple R-squared: 0.6671, Adjusted R-squared: 0.6603 ## F-statistic: 97.21 on 2 and 97 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 27 / 51
  • 48. Multiple linear regression Forest cover 0.0 0.2 0.4 0.6 0.8 1.0 Elevation 0 500 1000 1500 2000 Expectednumberofjays 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 28 / 51
  • 49. One-way ANOVA fm3 <- lm(jays ~ habitat, data=jayData) summary(fm3) ## ## Call: ## lm(formula = jays ~ habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -7.9143 -2.3684 -0.3684 3.0857 8.6316 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 35.875 1.356 26.456 <2e-16 *** ## habitatOak 3.493 1.448 2.413 0.0177 * ## habitatPine 2.039 1.503 1.357 0.1780 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.835 on 97 degrees of freedom ## Multiple R-squared: 0.07126, Adjusted R-squared: 0.05211 ## F-statistic: 3.721 on 2 and 97 DF, p-value: 0.02773 Motivation Linear models Example Matrix notation 29 / 51
  • 50. One-way ANOVA Bare Oak Pine Habitat Expectednumberofjays 010203040 Motivation Linear models Example Matrix notation 30 / 51
  • 51. ANCOVA fm4 <- lm(jays ~ elevation+habitat, data=jayData) summary(fm4) ## ## Call: ## lm(formula = jays ~ elevation + habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.0327 -1.5356 0.0091 1.4686 4.2391 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.072e+01 8.084e-01 37.997 < 2e-16 *** ## elevation 8.289e-03 5.414e-04 15.308 < 2e-16 *** ## habitatOak 3.166e+00 7.850e-01 4.034 0.00011 *** ## habitatPine 1.695e+00 8.148e-01 2.081 0.04010 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.078 on 96 degrees of freedom ## Multiple R-squared: 0.7301, Adjusted R-squared: 0.7217 ## F-statistic: 86.56 on 3 and 96 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 31 / 51
  • 53. Continuous-categorical interaction fm5 <- lm(jays ~ elevation*habitat, data=jayData) summary(fm5) ## ## Call: ## lm(formula = jays ~ elevation * habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.008 -1.581 -0.103 1.420 4.184 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 31.654383 1.446322 21.886 < 2e-16 *** ## elevation 0.006781 0.001999 3.393 0.00101 ** ## habitatOak 2.428682 1.565227 1.552 0.12411 ## habitatPine 0.399953 1.579874 0.253 0.80070 ## elevation:habitatOak 0.001204 0.002153 0.559 0.57737 ## elevation:habitatPine 0.002046 0.002151 0.951 0.34414 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.087 on 94 degrees of freedom ## Multiple R-squared: 0.7334, Adjusted R-squared: 0.7192 ## F-statistic: 51.72 on 5 and 94 DF, p-value: < 2.2e-16Motivation Linear models Example Matrix notation 33 / 51
  • 55. Quadratic effect of elevation fm6 <- lm(jays ~ elevation+I(elevation^2), data=jayData) summary(fm6) ## ## Call: ## lm(formula = jays ~ elevation + I(elevation^2), data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.8429 -1.4608 0.1304 1.5908 4.7854 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.162e+01 7.631e-01 41.434 < 2e-16 *** ## elevation 1.368e-02 2.342e-03 5.843 6.86e-08 *** ## I(elevation^2) -3.542e-06 1.503e-06 -2.357 0.0204 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.233 on 97 degrees of freedom ## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6786 ## F-statistic: 105.5 on 2 and 97 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 35 / 51
  • 56. Quadratic effect of elevation q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 500 1000 1500 30354045 Elevation (m) Jays Motivation Linear models Example Matrix notation 36 / 51
  • 57. Interaction and quadratic effects fm7 <- lm(jays ~ habitat * forest + elevation + I(elevation^2), data=jayData) summary(fm7) ## ## Call: ## lm(formula = jays ~ habitat * forest + elevation + I(elevation^2), ## data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.2574 -1.4400 0.0487 1.4055 3.7924 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.920e+01 1.030e+00 28.338 < 2e-16 *** ## habitatOak 3.705e+00 8.433e-01 4.394 2.98e-05 *** ## habitatPine 2.216e+00 8.757e-01 2.531 0.0131 * ## forest 4.007e+01 2.780e+01 1.441 0.1529 ## elevation 1.215e-02 2.300e-03 5.285 8.41e-07 *** ## I(elevation^2) -2.554e-06 1.484e-06 -1.721 0.0886 . ## habitatOak:forest -4.292e+01 2.785e+01 -1.541 0.1267 ## habitatPine:forest -3.918e+01 2.784e+01 -1.407 0.1627 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.044 on 92 degrees of freedom ## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7307 ## F-statistic: 39.37 on 7 and 92 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 37 / 51
  • 58. Predict jay abundance at each grid cell E7 <- predict(fm7, type="response", newdata=cruz2, interval="confidence") Motivation Linear models Example Matrix notation 38 / 51
  • 59. Predict jay abundance at each grid cell E7 <- predict(fm7, type="response", newdata=cruz2, interval="confidence") E7 <- cbind(cruz2[,c("x","y")], E7) head(E7) ## x y fit lwr upr ## 1 230736.7 3774324 35.68349 34.86313 36.50386 ## 2 231036.7 3774324 35.07284 34.22917 35.91652 ## 3 231336.7 3774324 34.58427 33.72668 35.44186 ## 4 230436.7 3774024 33.06042 31.55907 34.56177 ## 5 230736.7 3774024 39.18440 38.49766 39.87113 ## 6 231036.7 3774024 38.65512 37.98859 39.32165 Motivation Linear models Example Matrix notation 38 / 51
  • 60. Map the predictions Expected number of jays per grid cell 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 39 / 51
  • 61. Map the predictions Lower CI 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 40 / 51
  • 62. Map the predictions Upper CI 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 41 / 51
  • 63. Future scenarios What if pine and oak disapper? Expected number of jays per grid cell 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 42 / 51
  • 64. Future scenarios What if pine and oak disapper? Expected values 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 42 / 51
  • 65. Future scenarios What if sea level rises? Motivation Linear models Example Matrix notation 43 / 51
  • 66. Future scenarios What if sea level rises? Expected values 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 43 / 51
  • 67. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 44 / 51
  • 68. Linear model All of the fixed effects models that we have covered can be expressed this way: y = Xβ + ε where ε ∼ Normal(0, σ2 ) Motivation Linear models Example Matrix notation 45 / 51
  • 69. Linear model All of the fixed effects models that we have covered can be expressed this way: y = Xβ + ε where ε ∼ Normal(0, σ2 ) Examples include • Completely randomized ANOVA • Randomized complete block designs with fixed block effects • Factorial designs • ANCOVA Motivation Linear models Example Matrix notation 45 / 51
  • 70. Then how do they differ? • The design matrices are different • And so are the number of parameters (coefficients) to be estimated • Important to understand how to construct design matrix that includes categorical variables Motivation Linear models Example Matrix notation 46 / 51
  • 71. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. Motivation Linear models Example Matrix notation 47 / 51
  • 72. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Motivation Linear models Example Matrix notation 47 / 51
  • 73. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Motivation Linear models Example Matrix notation 47 / 51
  • 74. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables Motivation Linear models Example Matrix notation 47 / 51
  • 75. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables In R, the design matrix is created internally based on the formula that you provide Motivation Linear models Example Matrix notation 47 / 51
  • 76. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables In R, the design matrix is created internally based on the formula that you provide The design matrix can be viewed using the model.matrix function Motivation Linear models Example Matrix notation 47 / 51
  • 77. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Motivation Linear models Example Matrix notation 48 / 51
  • 78. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Design matrix X1 <- model.matrix(~age, data=dietData) head(X1, n=10) ## (Intercept) age ## 1 1 11.622260 ## 2 1 13.555397 ## 3 1 15.357372 ## 4 1 7.950214 ## 5 1 5.493861 ## 6 1 18.874970 ## 7 1 12.811297 ## 8 1 17.402436 ## 9 1 7.379666 ## 10 1 18.611817 Motivation Linear models Example Matrix notation 48 / 51
  • 79. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Design matrix X1 <- model.matrix(~age, data=dietData) head(X1, n=10) ## (Intercept) age ## 1 1 11.622260 ## 2 1 13.555397 ## 3 1 15.357372 ## 4 1 7.950214 ## 5 1 5.493861 ## 6 1 18.874970 ## 7 1 12.811297 ## 8 1 17.402436 ## 9 1 7.379666 ## 10 1 18.611817 How do we multiply this design matrix (X) by the vector of regression coefficients (β)? Motivation Linear models Example Matrix notation 48 / 51
  • 80. Matrix multiplication E(y) = Xβ Motivation Linear models Example Matrix notation 49 / 51
  • 81. Matrix multiplication E(y) = Xβ =   a b c d e f g h i j k l   ×     w x y z     Motivation Linear models Example Matrix notation 49 / 51
  • 82. Matrix multiplication E(y) = Xβ   aw + bx + cy + dz ew + fx + gy + hz iw + jx + ky + lz   =   a b c d e f g h i j k l   ×     w x y z     Motivation Linear models Example Matrix notation 49 / 51
  • 83. Matrix multiplication E(y) = Xβ   aw + bx + cy + dz ew + fx + gy + hz iw + jx + ky + lz   =   a b c d e f g h i j k l   ×     w x y z     In this example • The first matrix corresponds to the expected values of y • The second matrix corresponds to the design matrix X • The third matrix (a column vector) corresponds to β Motivation Linear models Example Matrix notation 49 / 51
  • 84. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 Motivation Linear models Example Matrix notation 50 / 51
  • 85. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 E(y) = Xβ or yi = β0 + β1xi Motivation Linear models Example Matrix notation 50 / 51
  • 86. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 E(y) = Xβ or yi = β0 + β1xi Ey1 <- X1 %*% beta head(Ey1, 5) ## [,1] ## 1 27.34634 ## 2 28.34784 ## 3 29.28138 ## 4 25.44398 ## 5 24.17142 Motivation Linear models Example Matrix notation 50 / 51
  • 87. Summary Linear models are the foundation of modern statistical modeling techniques Motivation Linear models Example Matrix notation 51 / 51
  • 88. Summary Linear models are the foundation of modern statistical modeling techniques They can be used to model a wide array of biological processes, and they can be easily extended when their assumptions do not hold Motivation Linear models Example Matrix notation 51 / 51
  • 89. Summary Linear models are the foundation of modern statistical modeling techniques They can be used to model a wide array of biological processes, and they can be easily extended when their assumptions do not hold One of the most important extensions is to cases where the residuals are not normally distributed. Generalized linear models address this issue. Motivation Linear models Example Matrix notation 51 / 51
  翻译: