SlideShare a Scribd company logo
Linear Regression, Costs & Gradient Descent
Pallavi Mishra
&
Revanth Kumar
Introduction to Linear Regression
• Linear Regression is a predictive model to map the relation between dependent variable
and one or more independent variables.
• It is a supervised learning method and regression problem which predicts
real valued output.
• The predicted output is done by forming Hypothesis based on learning algo.
𝑌 = 𝜃0 + 𝜃1 𝑥1 ( Single Independent Variable)
𝑌 = 𝜃0 + 𝜃1 𝑥1+ 𝜃2 𝑥2 +…..+ 𝜃 𝑘 𝑥 𝑘 ( Multiple Independent Variables)
= 𝑖=0
𝑘
𝜃𝑖 𝑥𝑖 ; Where 𝑥0 = 1 …………….(1)
Where 𝜃𝑖 = parameters for 𝑖 𝑡ℎindependent variable(s)
For estimation of performance of the linear model, SSE
Squared Sum Error (SSE) = 𝑖=1
𝑘
( 𝑌 − 𝑌)2
Note: Here, 𝑌 is the actual observed output
And, 𝑌 is the predicted output.
Hypothesis line
Actual Output (Y)
Predicted Output ( 𝑌)
Error
Model Representation
Training Set
Learning Algorithm
Hypothesis ( 𝑌)Unknown Independent Value Estimated Output Value
Fig.1 Model Representation of Linear Regression
Hint: Gradient descent as learning algorithm
How to Represent Hypothesis?
• We know, hypothesis is represented by 𝑌, which can be formulated
depending upon single variable linear regression (Univariate Linear
Regression) or Multi-variate linear regression.
• 𝑌 = 𝜃0 + 𝜃1 𝑥1
• Here, 𝜃0 = intercept and 𝜃1 = slope=
Δ𝑦
Δ𝑥
and 𝑥1 = independent variable
• Question arises: How do we choose 𝜃𝑖′ 𝑠 values for best fitting hypothesis?
• Idea : Choose 𝜃0 , 𝜃1 so that 𝑌 is close to 𝑌 for our training examples (x, y)
• Objective: min J(𝜃0 , 𝜃1 ),
• Note: J(𝜃0 , 𝜃1 ) = Cost Function.
• Formulation of J(𝜃0 , 𝜃1 ) =
1
2𝑚 𝑖=1
𝑚
( 𝑌(𝑖)−𝑌(𝑖))
2
Note: m = No. of instances of dataset
Objective function for linear regression
• The most important objective of linear regression model is to minimize cost function by
choosing a optimal value for 𝜃0 , 𝜃1.
• For optimization technique, Gradient Descent is mostly used in case of predictive models.
• By taking 𝜃0 = 0 and 𝜃1 = some random values ( in case of univariate linear regression),
the graph (𝜃1 vs J(𝜃1 )) gets represented in the form of bow shaped.
Advantage of Gradient descent in linear regression model
• No scope to stuck in local optima, since there is only
One global optima position where slope(𝜃1) = 0
(convex graph)
𝜃1
𝐽(𝜃1)
Normal Distribution N(𝜇, 𝜎2
)
Estimation of mean (𝝁) and variance (𝝈 𝟐):
• Let size of data set = n, denoted by 𝑦1, 𝑦2…… 𝑦𝑛
• Assuming 𝑦1, 𝑦2…… 𝑦𝑛 are independent random variables or Independent Identically
Distributed (iid), they are normally distributed random variables.
• Assuming no independent variables (x), in order to estimate the future value of y we need to find
to find unknown parameters (𝜇 & 𝜎2).
Concept of Maximum Likelihood Estimation:
• Using Maximum Likelihood Estimation (MLE) concept, we are trying to find the optimal value for
value for the mean (𝜇) and standard deviation (σ) for distribution given a bunch of observed
observed measurements.
• The goal of MLE is to find optimal way to fit a distribution to the data so, as to work easily with
with data
Continue…
Estimation of 𝝁 & 𝝈 𝟐
:
• Density of normal random variable = f(y) =
1
2𝜋𝜎
𝑒
−1
2𝜎2(𝑦−𝜇)2
L (𝜇, 𝜎2
) is a joint density
Now,
let, L (𝜇, 𝜎2
) = f (𝑦1, 𝑦2…… 𝑦 𝑛) = 𝑖=1
𝑛 1
2𝜋𝜎
𝑒
−1
2𝜎2(𝑦−𝜇)2
let, assume 𝜎2 = 𝜃
let, L (𝜇, 𝜃) =
1
( 2𝜋𝜃)
𝑛 𝑒
−1
2𝜃
(𝑦−𝜇)2
taking log on both sides
LL (𝜇, 𝜃) = log (2𝜋𝜃)−
𝑛
2 + log (𝑒
−1
2𝜎2(𝑦−𝜇)2
) ∗LL (𝜇, 𝜃) is denoted as log of joint density
=−
𝑛
2
log 2𝜋𝜃 −
1
2𝜃
(𝑦 − 𝜇)2
(2) ∗ 𝑙𝑜𝑔𝑒 𝑥
= 𝑥
Continue…
• Our objective is to estimate the next occurring of data point y in the distribution of data.
Using MLE we can find the optimal value for (μ, σ2). For a given trainings set we need to
find max LL (μ, θ) .
• Let us assume 𝜃 = 𝜎2
for simplicity
• Now, we use partial derivatives to find the optimal values of (μ, σ2) and equating to zero
𝐿𝐿′ = 0
LL (𝜇, 𝜃) = −
𝑛
2
log 2𝜋𝜃 −
1
2𝜃
(𝑦 − 𝜇)2
• Taking partial derivative wrt 𝜇 in eq (2), we get
𝐿𝐿 𝜇
′
= 0 −
2
2𝜃
(𝑦𝑖 − 𝜇) (-1)
=> (𝑦𝑖 − 𝜇) = 0 * 𝐿𝐿 𝜇
′
is partial derivative of LL wrt 𝜇
=> 𝑦𝑖 = 𝑛 𝜇
Continue…
𝜇 =
1
𝑛
𝑦𝑖 * μ is estimated mean value
Again taking partial derivatives on eq (2) wrt 𝜃
𝐿𝐿 𝜃
′
= −
𝑛
2
1
2𝜋𝜃
2𝜋 −
−1
2𝜃2 (𝑦𝑖 − 𝜇)2
Setting above to zero, we get
⇒
1
2𝜃
(𝑦𝑖 − 𝜇)2 =
𝑛
2
1
𝜃
Finally, this leads to solution
𝜎2 = 𝜃 =
1
𝑛
(𝑦𝑖 − 𝜇)2 * 𝜎2 is estimated variance
After plugging estimate of
𝜎2 =
1
𝑛
(𝑦 − 𝑦)2
𝜇 =
1
𝑛
𝑦𝑖
Continue…
• Above estimate can be generalized to 𝜎2 =
1
𝑛
𝑒𝑟𝑟𝑜𝑟2 * error = y − 𝑦
• Finally we estimated the value of mean and variance in order to predict the future
occurrence of y ( 𝑦) data points.
• Therefore the best estimate of occurrence of next y ( 𝑦) that is likely to occur is 𝜇 and the
solution is arrived by using SSE ( 𝜎2)
𝜎2 =
1
𝑛
𝑒𝑟𝑟𝑜𝑟2
Optimization & Derivatives
J(𝜃) =
1
2𝑛 𝑖=1
𝑖=𝑛
(𝑦𝑖 − 𝑗=1
𝑗=𝑘
𝑥𝑖𝑗 𝜃𝑗)2
Y=
𝑦1
𝑦2
…
𝑦𝑛
; X=
𝑥11 𝑥12 … 𝑥1𝑘
𝑥21 𝑥22 … 𝑥2𝑘
…
𝑥 𝑛1
…
𝑥2𝑛
… …
… 𝑥 𝑛𝑘
; 𝜃=
𝜃1
𝜃2
…
𝜃 𝑘
𝑗=1
𝑗=𝑘
𝑥𝑖𝑗 𝜃𝑗 is simple multiplication of 𝑖 𝑡ℎ
row of matrix X and vector 𝜃 . Hence
=
1
2𝑛 𝑖=1
𝑖=𝑛
(𝑌 − 𝑋𝜃)2
Continue…
= 𝑌 − 𝑌
′
(𝑌 − 𝑌) ∴ 𝑌 = 𝑋𝜃
J(𝜃)=
1
2𝑛
𝑌 − 𝑋𝜃 ′(𝑌 − 𝑋𝜃)
= 𝑌′
𝑌 − 𝑌′
𝑋𝜃 − 𝑌𝑋𝜃′
− 𝑋𝜃′
𝑋𝜃
Now, Derivative with respect to 𝜃
𝜕
𝜕𝜃
= 0 – 2XY + 2𝑋2 𝜃
=
1
2𝑛
(– 2XY + 2𝑋2 𝜃)
= −
2
2𝑛
(XY – 𝑋2 𝜃)
= −
1
𝑛
(XY – 𝑋′ 𝑋𝜃)
= −
1
𝑛
𝑋′
(Y – 𝑌)
J(𝜃)=
1
𝑛
𝑋′( 𝑌 − 𝑌)
How to start with Gradient Descent
• The basic assumption is to start at any random position 𝑥0 and take derivative value.
• 1 𝑠𝑡 case: if derivative value > 0 , increasing
• Action : then change the 𝜃1 values using the gradient descent formula.
• 𝜃1 = 𝜃1 - 𝛼
𝑑 𝐽(𝜃1)
𝑑𝜃1
• here, 𝛼 = learning rate / parameter
Gradient Descent algorithm
• Repeat until convergence { 𝜃1: = 𝜃1 - 𝛼
𝑑 𝐽(𝜃1)
𝑑𝜃1
here, assuming 𝜃0 = 0 for univariate linear
regression }
For multi variate linear regression:
• Repeat until convergence { 𝜃𝑗 := 𝜃𝑗 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃 𝑗
}
Simultaneous update of 𝜃0, 𝜃1
Temp 0 := 𝜃 𝑜: = 𝜃0 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃0
Temp 1 := 𝜃1: = 𝜃1 - 𝛼
𝑑 𝐽(𝜃0, 𝜃1)
𝑑𝜃1
𝜃 𝑜: = Temp 0
𝜃1: = Temp 1
Effects associated with varying values of
learning rate (𝛼)
𝛼
Continue:
• In the first case, we may find difficulty to reach at global optima since large value of 𝛼 may
overshoot the optimal position due to aggressive updating of 𝜃 values.
• Therefore, as we approach optima position, gradient descent will take automatically
smaller steps.
Conclusion
• The cost function for linear regression is always gong to be a bow-shaped function
(convex function)
• This function doesn’t have an local optima except for the one global optima.
• Therefore, using cost function of type 𝐽(𝜃0, 𝜃1) which we get whenever we are using linear
regression, it will always converge to the global optimum.
• Most important is make sure our gradient descent algorithms is working properly .
• On increasing number of iterations, the value of 𝐽(𝜃0, 𝜃1) should get decreasing after every
iterations.
• Determining the automatic convergence test is difficult because we don't know the
threshold value.
Ad

More Related Content

What's hot (20)

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
ESCOM
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
swapnac12
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Usama Fayyaz
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
swapnac12
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?
Anant Soft Computing
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
Siddharth Shrivastava
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
Vignesh Saravanan
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Edureka!
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
Vishal Singh
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
Abdelfattah Al Zaqqa
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
ESCOM
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
swapnac12
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Usama Fayyaz
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
swapnac12
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?
Anant Soft Computing
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
Siddharth Shrivastava
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Edureka!
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
Vishal Singh
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 

Similar to Linear regression, costs & gradient descent (20)

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
SantiagoGarridoBulln
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
Sindhu Rumesh Kumar
 
EBS30145678CALCULUS - Units 1 and 2.pptx
EBS30145678CALCULUS - Units 1 and 2.pptxEBS30145678CALCULUS - Units 1 and 2.pptx
EBS30145678CALCULUS - Units 1 and 2.pptx
richmondprempehengin
 
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
rofiho9697
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
Differentiation
DifferentiationDifferentiation
Differentiation
Anirudh Gaddamanugu
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
Dimas Ruliandi
 
11_Học máy cơ bản_Hồi quy tuyến tính.pdf
11_Học máy cơ bản_Hồi quy tuyến tính.pdf11_Học máy cơ bản_Hồi quy tuyến tính.pdf
11_Học máy cơ bản_Hồi quy tuyến tính.pdf
it96cokhibachkhoapho
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
RahulChaudhry15
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
fashiontrendzz20
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
Tsuyoshi Sakama
 
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
International Journal of Engineering Inventions www.ijeijournal.com
 
E04 06 3943
E04 06 3943E04 06 3943
E04 06 3943
International Journal of Engineering Inventions www.ijeijournal.com
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rl
ChoiJinwon3
 
Differential Calculus- differentiation
Differential Calculus- differentiationDifferential Calculus- differentiation
Differential Calculus- differentiation
Santhanam Krishnan
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
Wiw Miu
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
MuhannadSaleh
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
Shuai Zhang
 
Basic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docxBasic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docx
SalwaAbdulkarim1
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
SantiagoGarridoBulln
 
EBS30145678CALCULUS - Units 1 and 2.pptx
EBS30145678CALCULUS - Units 1 and 2.pptxEBS30145678CALCULUS - Units 1 and 2.pptx
EBS30145678CALCULUS - Units 1 and 2.pptx
richmondprempehengin
 
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
rofiho9697
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
Dimas Ruliandi
 
11_Học máy cơ bản_Hồi quy tuyến tính.pdf
11_Học máy cơ bản_Hồi quy tuyến tính.pdf11_Học máy cơ bản_Hồi quy tuyến tính.pdf
11_Học máy cơ bản_Hồi quy tuyến tính.pdf
it96cokhibachkhoapho
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
fashiontrendzz20
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
Tsuyoshi Sakama
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rl
ChoiJinwon3
 
Differential Calculus- differentiation
Differential Calculus- differentiationDifferential Calculus- differentiation
Differential Calculus- differentiation
Santhanam Krishnan
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
Wiw Miu
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
MuhannadSaleh
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
Shuai Zhang
 
Basic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docxBasic%20Cal%20Final.docx.docx
Basic%20Cal%20Final.docx.docx
SalwaAbdulkarim1
 
Ad

More from Revanth Kumar (8)

APPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGAPPLIED MACHINE LEARNING
APPLIED MACHINE LEARNING
Revanth Kumar
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithms
Revanth Kumar
 
Back propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU functionBack propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU function
Revanth Kumar
 
Math behind the kernels
Math behind the kernelsMath behind the kernels
Math behind the kernels
Revanth Kumar
 
Kernels in convolution
Kernels in convolutionKernels in convolution
Kernels in convolution
Revanth Kumar
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
 
Self driving car
Self driving carSelf driving car
Self driving car
Revanth Kumar
 
Tomography System
Tomography SystemTomography System
Tomography System
Revanth Kumar
 
APPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGAPPLIED MACHINE LEARNING
APPLIED MACHINE LEARNING
Revanth Kumar
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithms
Revanth Kumar
 
Back propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU functionBack propagation using sigmoid & ReLU function
Back propagation using sigmoid & ReLU function
Revanth Kumar
 
Math behind the kernels
Math behind the kernelsMath behind the kernels
Math behind the kernels
Revanth Kumar
 
Kernels in convolution
Kernels in convolutionKernels in convolution
Kernels in convolution
Revanth Kumar
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
Revanth Kumar
 
Ad

Recently uploaded (20)

Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
roshinijoga
 
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Journal of Soft Computing in Civil Engineering
 
Understanding Structural Loads and Load Paths
Understanding Structural Loads and Load PathsUnderstanding Structural Loads and Load Paths
Understanding Structural Loads and Load Paths
University of Kirkuk
 
Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1
remoteaimms
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
Surveying through global positioning system
Surveying through global positioning systemSurveying through global positioning system
Surveying through global positioning system
opneptune5
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
A Survey of Personalized Large Language Models.pptx
A Survey of Personalized Large Language Models.pptxA Survey of Personalized Large Language Models.pptx
A Survey of Personalized Large Language Models.pptx
rutujabhaskarraopati
 
Working with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to ImplementationWorking with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to Implementation
Alabama Transportation Assistance Program
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Analog electronic circuits with some imp
Analog electronic circuits with some impAnalog electronic circuits with some imp
Analog electronic circuits with some imp
KarthikTG7
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
Parameter-Efficient Fine-Tuning (PEFT) techniques across language, vision, ge...
roshinijoga
 
Understanding Structural Loads and Load Paths
Understanding Structural Loads and Load PathsUnderstanding Structural Loads and Load Paths
Understanding Structural Loads and Load Paths
University of Kirkuk
 
Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1Computer Security Fundamentals Chapter 1
Computer Security Fundamentals Chapter 1
remoteaimms
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
Surveying through global positioning system
Surveying through global positioning systemSurveying through global positioning system
Surveying through global positioning system
opneptune5
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
A Survey of Personalized Large Language Models.pptx
A Survey of Personalized Large Language Models.pptxA Survey of Personalized Large Language Models.pptx
A Survey of Personalized Large Language Models.pptx
rutujabhaskarraopati
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Analog electronic circuits with some imp
Analog electronic circuits with some impAnalog electronic circuits with some imp
Analog electronic circuits with some imp
KarthikTG7
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 

Linear regression, costs & gradient descent

  • 1. Linear Regression, Costs & Gradient Descent Pallavi Mishra & Revanth Kumar
  • 2. Introduction to Linear Regression • Linear Regression is a predictive model to map the relation between dependent variable and one or more independent variables. • It is a supervised learning method and regression problem which predicts real valued output. • The predicted output is done by forming Hypothesis based on learning algo. 𝑌 = 𝜃0 + 𝜃1 𝑥1 ( Single Independent Variable) 𝑌 = 𝜃0 + 𝜃1 𝑥1+ 𝜃2 𝑥2 +…..+ 𝜃 𝑘 𝑥 𝑘 ( Multiple Independent Variables) = 𝑖=0 𝑘 𝜃𝑖 𝑥𝑖 ; Where 𝑥0 = 1 …………….(1) Where 𝜃𝑖 = parameters for 𝑖 𝑡ℎindependent variable(s) For estimation of performance of the linear model, SSE Squared Sum Error (SSE) = 𝑖=1 𝑘 ( 𝑌 − 𝑌)2 Note: Here, 𝑌 is the actual observed output And, 𝑌 is the predicted output. Hypothesis line Actual Output (Y) Predicted Output ( 𝑌) Error
  • 3. Model Representation Training Set Learning Algorithm Hypothesis ( 𝑌)Unknown Independent Value Estimated Output Value Fig.1 Model Representation of Linear Regression Hint: Gradient descent as learning algorithm
  • 4. How to Represent Hypothesis? • We know, hypothesis is represented by 𝑌, which can be formulated depending upon single variable linear regression (Univariate Linear Regression) or Multi-variate linear regression. • 𝑌 = 𝜃0 + 𝜃1 𝑥1 • Here, 𝜃0 = intercept and 𝜃1 = slope= Δ𝑦 Δ𝑥 and 𝑥1 = independent variable • Question arises: How do we choose 𝜃𝑖′ 𝑠 values for best fitting hypothesis? • Idea : Choose 𝜃0 , 𝜃1 so that 𝑌 is close to 𝑌 for our training examples (x, y) • Objective: min J(𝜃0 , 𝜃1 ), • Note: J(𝜃0 , 𝜃1 ) = Cost Function. • Formulation of J(𝜃0 , 𝜃1 ) = 1 2𝑚 𝑖=1 𝑚 ( 𝑌(𝑖)−𝑌(𝑖)) 2 Note: m = No. of instances of dataset
  • 5. Objective function for linear regression • The most important objective of linear regression model is to minimize cost function by choosing a optimal value for 𝜃0 , 𝜃1. • For optimization technique, Gradient Descent is mostly used in case of predictive models. • By taking 𝜃0 = 0 and 𝜃1 = some random values ( in case of univariate linear regression), the graph (𝜃1 vs J(𝜃1 )) gets represented in the form of bow shaped. Advantage of Gradient descent in linear regression model • No scope to stuck in local optima, since there is only One global optima position where slope(𝜃1) = 0 (convex graph) 𝜃1 𝐽(𝜃1)
  • 6. Normal Distribution N(𝜇, 𝜎2 ) Estimation of mean (𝝁) and variance (𝝈 𝟐): • Let size of data set = n, denoted by 𝑦1, 𝑦2…… 𝑦𝑛 • Assuming 𝑦1, 𝑦2…… 𝑦𝑛 are independent random variables or Independent Identically Distributed (iid), they are normally distributed random variables. • Assuming no independent variables (x), in order to estimate the future value of y we need to find to find unknown parameters (𝜇 & 𝜎2). Concept of Maximum Likelihood Estimation: • Using Maximum Likelihood Estimation (MLE) concept, we are trying to find the optimal value for value for the mean (𝜇) and standard deviation (σ) for distribution given a bunch of observed observed measurements. • The goal of MLE is to find optimal way to fit a distribution to the data so, as to work easily with with data
  • 7. Continue… Estimation of 𝝁 & 𝝈 𝟐 : • Density of normal random variable = f(y) = 1 2𝜋𝜎 𝑒 −1 2𝜎2(𝑦−𝜇)2 L (𝜇, 𝜎2 ) is a joint density Now, let, L (𝜇, 𝜎2 ) = f (𝑦1, 𝑦2…… 𝑦 𝑛) = 𝑖=1 𝑛 1 2𝜋𝜎 𝑒 −1 2𝜎2(𝑦−𝜇)2 let, assume 𝜎2 = 𝜃 let, L (𝜇, 𝜃) = 1 ( 2𝜋𝜃) 𝑛 𝑒 −1 2𝜃 (𝑦−𝜇)2 taking log on both sides LL (𝜇, 𝜃) = log (2𝜋𝜃)− 𝑛 2 + log (𝑒 −1 2𝜎2(𝑦−𝜇)2 ) ∗LL (𝜇, 𝜃) is denoted as log of joint density =− 𝑛 2 log 2𝜋𝜃 − 1 2𝜃 (𝑦 − 𝜇)2 (2) ∗ 𝑙𝑜𝑔𝑒 𝑥 = 𝑥
  • 8. Continue… • Our objective is to estimate the next occurring of data point y in the distribution of data. Using MLE we can find the optimal value for (μ, σ2). For a given trainings set we need to find max LL (μ, θ) . • Let us assume 𝜃 = 𝜎2 for simplicity • Now, we use partial derivatives to find the optimal values of (μ, σ2) and equating to zero 𝐿𝐿′ = 0 LL (𝜇, 𝜃) = − 𝑛 2 log 2𝜋𝜃 − 1 2𝜃 (𝑦 − 𝜇)2 • Taking partial derivative wrt 𝜇 in eq (2), we get 𝐿𝐿 𝜇 ′ = 0 − 2 2𝜃 (𝑦𝑖 − 𝜇) (-1) => (𝑦𝑖 − 𝜇) = 0 * 𝐿𝐿 𝜇 ′ is partial derivative of LL wrt 𝜇 => 𝑦𝑖 = 𝑛 𝜇
  • 9. Continue… 𝜇 = 1 𝑛 𝑦𝑖 * μ is estimated mean value Again taking partial derivatives on eq (2) wrt 𝜃 𝐿𝐿 𝜃 ′ = − 𝑛 2 1 2𝜋𝜃 2𝜋 − −1 2𝜃2 (𝑦𝑖 − 𝜇)2 Setting above to zero, we get ⇒ 1 2𝜃 (𝑦𝑖 − 𝜇)2 = 𝑛 2 1 𝜃 Finally, this leads to solution 𝜎2 = 𝜃 = 1 𝑛 (𝑦𝑖 − 𝜇)2 * 𝜎2 is estimated variance After plugging estimate of 𝜎2 = 1 𝑛 (𝑦 − 𝑦)2 𝜇 = 1 𝑛 𝑦𝑖
  • 10. Continue… • Above estimate can be generalized to 𝜎2 = 1 𝑛 𝑒𝑟𝑟𝑜𝑟2 * error = y − 𝑦 • Finally we estimated the value of mean and variance in order to predict the future occurrence of y ( 𝑦) data points. • Therefore the best estimate of occurrence of next y ( 𝑦) that is likely to occur is 𝜇 and the solution is arrived by using SSE ( 𝜎2) 𝜎2 = 1 𝑛 𝑒𝑟𝑟𝑜𝑟2
  • 11. Optimization & Derivatives J(𝜃) = 1 2𝑛 𝑖=1 𝑖=𝑛 (𝑦𝑖 − 𝑗=1 𝑗=𝑘 𝑥𝑖𝑗 𝜃𝑗)2 Y= 𝑦1 𝑦2 … 𝑦𝑛 ; X= 𝑥11 𝑥12 … 𝑥1𝑘 𝑥21 𝑥22 … 𝑥2𝑘 … 𝑥 𝑛1 … 𝑥2𝑛 … … … 𝑥 𝑛𝑘 ; 𝜃= 𝜃1 𝜃2 … 𝜃 𝑘 𝑗=1 𝑗=𝑘 𝑥𝑖𝑗 𝜃𝑗 is simple multiplication of 𝑖 𝑡ℎ row of matrix X and vector 𝜃 . Hence = 1 2𝑛 𝑖=1 𝑖=𝑛 (𝑌 − 𝑋𝜃)2
  • 12. Continue… = 𝑌 − 𝑌 ′ (𝑌 − 𝑌) ∴ 𝑌 = 𝑋𝜃 J(𝜃)= 1 2𝑛 𝑌 − 𝑋𝜃 ′(𝑌 − 𝑋𝜃) = 𝑌′ 𝑌 − 𝑌′ 𝑋𝜃 − 𝑌𝑋𝜃′ − 𝑋𝜃′ 𝑋𝜃 Now, Derivative with respect to 𝜃 𝜕 𝜕𝜃 = 0 – 2XY + 2𝑋2 𝜃 = 1 2𝑛 (– 2XY + 2𝑋2 𝜃) = − 2 2𝑛 (XY – 𝑋2 𝜃) = − 1 𝑛 (XY – 𝑋′ 𝑋𝜃) = − 1 𝑛 𝑋′ (Y – 𝑌) J(𝜃)= 1 𝑛 𝑋′( 𝑌 − 𝑌)
  • 13. How to start with Gradient Descent • The basic assumption is to start at any random position 𝑥0 and take derivative value. • 1 𝑠𝑡 case: if derivative value > 0 , increasing • Action : then change the 𝜃1 values using the gradient descent formula. • 𝜃1 = 𝜃1 - 𝛼 𝑑 𝐽(𝜃1) 𝑑𝜃1 • here, 𝛼 = learning rate / parameter
  • 14. Gradient Descent algorithm • Repeat until convergence { 𝜃1: = 𝜃1 - 𝛼 𝑑 𝐽(𝜃1) 𝑑𝜃1 here, assuming 𝜃0 = 0 for univariate linear regression } For multi variate linear regression: • Repeat until convergence { 𝜃𝑗 := 𝜃𝑗 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃 𝑗 } Simultaneous update of 𝜃0, 𝜃1 Temp 0 := 𝜃 𝑜: = 𝜃0 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃0 Temp 1 := 𝜃1: = 𝜃1 - 𝛼 𝑑 𝐽(𝜃0, 𝜃1) 𝑑𝜃1 𝜃 𝑜: = Temp 0 𝜃1: = Temp 1
  • 15. Effects associated with varying values of learning rate (𝛼) 𝛼
  • 16. Continue: • In the first case, we may find difficulty to reach at global optima since large value of 𝛼 may overshoot the optimal position due to aggressive updating of 𝜃 values. • Therefore, as we approach optima position, gradient descent will take automatically smaller steps.
  • 17. Conclusion • The cost function for linear regression is always gong to be a bow-shaped function (convex function) • This function doesn’t have an local optima except for the one global optima. • Therefore, using cost function of type 𝐽(𝜃0, 𝜃1) which we get whenever we are using linear regression, it will always converge to the global optimum. • Most important is make sure our gradient descent algorithms is working properly . • On increasing number of iterations, the value of 𝐽(𝜃0, 𝜃1) should get decreasing after every iterations. • Determining the automatic convergence test is difficult because we don't know the threshold value.
  翻译: