SlideShare a Scribd company logo
2. Linear Regression with One
Variable – or Univariate Linear Regression
MODEL REPRESENTATION:
➢Other Notations:
X = space of input values
Y = space of output values
Dataset = list of m training examples → (x(i)
,y(i)
); i = 1,2, . . . ,m
Hypothesis Function: Function which is derived by feeding
training data (Input and Output (Supervised) ) to the learning
algorithm, which can then be used to predict o/p for new input
data.
➢For a supervised problem: h : X → Y so that h(x) is a “good”
predictor for the corresponding value of y.
COST FUNCTION: Can measure the accuracy of our hypothesis
function by Choosing parameters of h(x) such that h(x) is close to y
for data in the training set.
Cost function is a minimization function:
Here, we try to minimize the (1/2m) x (sum of squared differences
b/w predicted value and actual value given in dataset).
➢ “J” is the cost function or squared error cost function or
Mean squared error
➢(1/2m) is for averaging the squared difference.
Minimize means, we try to find values of θ parameters such that
the cost function is minimized.
Cost ➔ An average difference of all the results of the hypothesis
with inputs from x's and the actual output y's.
➢The mean is halved as a convenience for the computation of
the gradient descent, as the derivative term of the square
function will cancel out the 1/2 term.
COST FUNCTION INTUITION: Training set data is scattered on x-y
plane. We try to draw a straight line through it. Goal → find best
fitting line.
Ideally, the line should pass through all the points of our training
data set. In such a case, the value of “J” will be 0.
➢For simplicity: let
For Θ=1 ➔ h(Θ)=x ➔ J(Θ)=0
For Θ=0.5 ➔ h(Θ)=0.5x ➔
J(Θ)=0.58
For Θ=0 ➔ h(Θ)=0 ➔ J(Θ)=2.3
➢J(Θ) is the average of sqr of diff
bw h(x) and y:
➢h(x) = predicted value at given
training data
➢y = actual value of o/p in training data
➢Vertical lines
represent the given
difference
➢For different values of
Θ, we try to minimize J(Θ) {error}, which occurs at Θ=1.
Therefore, we choose Θ=1 as out best fitting curve: h(x)=x
≫ For more complex h(x) fxn like h(x) = Θ0 + Θ1x : we have to plot
J(Θ0, Θ1) in 3D. As, for different combinations of Θ0 and Θ1, J can be
different.
These can be more easily represented using contour figures: A
contour plot is a graph that contains many contour lines. A contour
line of a two variable function has a constant value at all points on
the line.
Graph b/w Θ0 and Θ1: these ellipses are the combinations of Θ0 and
Θ1 for which value of J is same. Points other than on ellipses are
also valid points, they also correspond to a unique value of J.
➢The best combination (one which minimizes J(Θ0, Θ1) ) of Θ0 and
Θ1 lies around center of the innermost circle.
 GRADIENT DESCENT: an algorithm to minimize J(Θ0, Θ1)
➢Start with some Θ0, Θ1
➢Keep changing Θ0, Θ1 to reduce J until minimum is reached
Here, we start at a value of Θ0, Θ1 and keep going down on J(Θ0,
Θ1) curve until we reach a local minima. We will know that we have
succeeded when our cost function is at the very bottom of the pits
in our graph.
➢If we start at a diff value of Θ0, Θ1, we end having different
minima.
We are not graphing x and y itself, but the parameter range of our
hypothesis function and the cost resulting from selecting a
particular set of parameters.
The slope of the tangent is the derivative at that point and it will
give us a direction to move towards. We make steps down the cost
function in the direction with the steepest descent. The size of
each step is determined by the parameter α, which is called the
learning rate.
α = Learning rate
SIMULTANEOUS UPDATE: first we calculate new value for both Θ0
and Θ1, then only we update their values. So order of execution of
statements is:
Here, if update Θ0 before actually calculating the value of new Θ1,
the Θ0 used in equation of Θ1 will be new Θ0, not the one we
wanted to minimize for.
At each iteration j, one should simultaneously update the
parameters .. Θ0, Θ1…Θn. Updating a specific parameter prior to
calculating another one on the jth
iteration would yield a wrong
implementation.
GRADIENT DESCENT INTUITION: for simplicity we only use one
parameter:
h(x) = Θ1.x
→α is positive
For a value of Θ1, if the slope of J(Θ) is positive: Θ1 decreases
For negative slope of J(Θ): Θ1 increases
θ1 eventually converges to its minimum.
If the value of α is too small: gradient descent takes baby steps
towards the min.
If α is too large: gradient descent takes huge steps.. in such
case the gradient descent may even overshoot the min if the
diff b/w initial Θ and Θmin is less than the value of jump in Θ
(α * derivative of J) and it may start going further and further
from the min.
➢Therefore, we should adjust our parameter to ensure that the
gradient descent algorithm converges in a reasonable time.
➢If the Θ is already at its local minimum, the slope will be 0.. thus
Θ won’t change
Even if the learning rate α is fixed, the slope gets smaller as we
reach towards the minima.. so the steps automatically become
smaller
For a linear regression model: Derivative of J:
For Θ0 – derivate wrt Θ0
For Θ1 – derivate wrt Θ1
➢For a linear regression model: the curve is always a convex curve
(bowl shaped).
It has only one optimum → global minima (assuming the
learning rate α is not too large).
➢ In the contour curve: we start of with any value of Θ0 and Θ1 and
then we minimize the J.
➢We approach the min as we reach towards the center.
We start at an arbitrary value for Θ0 and Θ1:
We start minimizing J(Θ0, Θ1) with our gradient descent algo:
J is a complicated quadratic function
The ellipses shown above are the contours of a quadratic function
Batch Gradient Descent → Each step of gradient descent uses all
training examples.
The point of all this is that if we start with a guess for our
hypothesis and then repeatedly apply these gradient descent
equations, our hypothesis will become more and more accurate.
Ad

More Related Content

What's hot (20)

Preprocessors
PreprocessorsPreprocessors
Preprocessors
Koganti Ravikumar
 
Python Control Structures.pptx
Python Control Structures.pptxPython Control Structures.pptx
Python Control Structures.pptx
M Vishnuvardhan Reddy
 
Pointers - DataStructures
Pointers - DataStructuresPointers - DataStructures
Pointers - DataStructures
Omair Imtiaz Ansari
 
Fuzzy sets
Fuzzy sets Fuzzy sets
Fuzzy sets
ABSARQURESHI
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
Self-Employed
 
Linux Administration
Linux AdministrationLinux Administration
Linux Administration
Harish1983
 
Biconnected components (13024116056)
Biconnected components (13024116056)Biconnected components (13024116056)
Biconnected components (13024116056)
Akshay soni
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Constructors and Destructor in C++
Constructors and Destructor in C++Constructors and Destructor in C++
Constructors and Destructor in C++
International Institute of Information Technology (I²IT)
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
Anaya Zafar
 
Exception handling in java.pptx
Exception handling in java.pptxException handling in java.pptx
Exception handling in java.pptx
Nagaraju Pamarthi
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
International Institute of Information Technology (I²IT)
 
parameter passing in c#
parameter passing in c#parameter passing in c#
parameter passing in c#
khush_boo31
 
Implementing Folder Redirection In Active Directory
Implementing Folder Redirection In Active DirectoryImplementing Folder Redirection In Active Directory
Implementing Folder Redirection In Active Directory
Rakhilya Ibildayeva
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
Raghu nath
 
Type casting
Type castingType casting
Type casting
simarsimmygrewal
 
Operators in python
Operators in pythonOperators in python
Operators in python
eShikshak
 
Nested List Comprehension and Binary Search
Nested List Comprehension and Binary SearchNested List Comprehension and Binary Search
Nested List Comprehension and Binary Search
Colin Su
 
File handling in c language
File handling in c languageFile handling in c language
File handling in c language
Harish Gyanani
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
Self-Employed
 
Linux Administration
Linux AdministrationLinux Administration
Linux Administration
Harish1983
 
Biconnected components (13024116056)
Biconnected components (13024116056)Biconnected components (13024116056)
Biconnected components (13024116056)
Akshay soni
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
Anaya Zafar
 
Exception handling in java.pptx
Exception handling in java.pptxException handling in java.pptx
Exception handling in java.pptx
Nagaraju Pamarthi
 
parameter passing in c#
parameter passing in c#parameter passing in c#
parameter passing in c#
khush_boo31
 
Implementing Folder Redirection In Active Directory
Implementing Folder Redirection In Active DirectoryImplementing Folder Redirection In Active Directory
Implementing Folder Redirection In Active Directory
Rakhilya Ibildayeva
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
Raghu nath
 
Operators in python
Operators in pythonOperators in python
Operators in python
eShikshak
 
Nested List Comprehension and Binary Search
Nested List Comprehension and Binary SearchNested List Comprehension and Binary Search
Nested List Comprehension and Binary Search
Colin Su
 
File handling in c language
File handling in c languageFile handling in c language
File handling in c language
Harish Gyanani
 

Similar to 2 linear regression with one variable (20)

CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
Eric Conner
 
Machine learning
Machine learningMachine learning
Machine learning
Shreyas G S
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
NYversity
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
Emad Nabil
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
TanmayVijay1
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
Marco Moldenhauer
 
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docxCALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
RAHUL126667
 
Statistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionStatistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling Distribution
Dexlab Analytics
 
Linear regression
Linear regression Linear regression
Linear regression
mohamed Naas
 
6 logistic regression classification algo
6 logistic regression   classification algo6 logistic regression   classification algo
6 logistic regression classification algo
TanmayVijay1
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
Jungkyu Lee
 
Linear regression
Linear regressionLinear regression
Linear regression
Zoya Bylinskii
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
Amir Saleh
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
Matthew L Levy
 
Paso 3 profundizar y contextualizar el conocimiento de la unidad 2
Paso 3  profundizar y contextualizar el conocimiento de la unidad 2Paso 3  profundizar y contextualizar el conocimiento de la unidad 2
Paso 3 profundizar y contextualizar el conocimiento de la unidad 2
ALGEBRAGEOMETRIA
 
Gradient descent
Gradient descentGradient descent
Gradient descent
Prashant Mudgal
 
7 regularization
7 regularization7 regularization
7 regularization
TanmayVijay1
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
Chode Amarnath
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
SEMINARGROOT
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
Eric Conner
 
Machine learning
Machine learningMachine learning
Machine learning
Shreyas G S
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
NYversity
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
Emad Nabil
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
TanmayVijay1
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
Marco Moldenhauer
 
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docxCALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
CALIFORNIA STATE UNIVERSITY, NORTHRIDGEMECHANICAL ENGINEERIN.docx
RAHUL126667
 
Statistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionStatistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling Distribution
Dexlab Analytics
 
Linear regression
Linear regression Linear regression
Linear regression
mohamed Naas
 
6 logistic regression classification algo
6 logistic regression   classification algo6 logistic regression   classification algo
6 logistic regression classification algo
TanmayVijay1
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
Jungkyu Lee
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
Amir Saleh
 
Paso 3 profundizar y contextualizar el conocimiento de la unidad 2
Paso 3  profundizar y contextualizar el conocimiento de la unidad 2Paso 3  profundizar y contextualizar el conocimiento de la unidad 2
Paso 3 profundizar y contextualizar el conocimiento de la unidad 2
ALGEBRAGEOMETRIA
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
Chode Amarnath
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
SEMINARGROOT
 
Ad

More from TanmayVijay1 (14)

18 application example photo ocr
18 application example  photo ocr18 application example  photo ocr
18 application example photo ocr
TanmayVijay1
 
1 Introduction to Machine Learning
1 Introduction to Machine Learning1 Introduction to Machine Learning
1 Introduction to Machine Learning
TanmayVijay1
 
17 large scale machine learning
17 large scale machine learning17 large scale machine learning
17 large scale machine learning
TanmayVijay1
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systems
TanmayVijay1
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
TanmayVijay1
 
14 dimentionality reduction
14 dimentionality reduction14 dimentionality reduction
14 dimentionality reduction
TanmayVijay1
 
13 unsupervised learning clustering
13 unsupervised learning   clustering13 unsupervised learning   clustering
13 unsupervised learning clustering
TanmayVijay1
 
12 support vector machines
12 support vector machines12 support vector machines
12 support vector machines
TanmayVijay1
 
11 ml system design
11 ml system design11 ml system design
11 ml system design
TanmayVijay1
 
10 advice for applying ml
10 advice for applying ml10 advice for applying ml
10 advice for applying ml
TanmayVijay1
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learning
TanmayVijay1
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
TanmayVijay1
 
5 octave tutorial
5 octave tutorial5 octave tutorial
5 octave tutorial
TanmayVijay1
 
3 linear algebra review
3 linear algebra review3 linear algebra review
3 linear algebra review
TanmayVijay1
 
18 application example photo ocr
18 application example  photo ocr18 application example  photo ocr
18 application example photo ocr
TanmayVijay1
 
1 Introduction to Machine Learning
1 Introduction to Machine Learning1 Introduction to Machine Learning
1 Introduction to Machine Learning
TanmayVijay1
 
17 large scale machine learning
17 large scale machine learning17 large scale machine learning
17 large scale machine learning
TanmayVijay1
 
16 recommender systems
16 recommender systems16 recommender systems
16 recommender systems
TanmayVijay1
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
TanmayVijay1
 
14 dimentionality reduction
14 dimentionality reduction14 dimentionality reduction
14 dimentionality reduction
TanmayVijay1
 
13 unsupervised learning clustering
13 unsupervised learning   clustering13 unsupervised learning   clustering
13 unsupervised learning clustering
TanmayVijay1
 
12 support vector machines
12 support vector machines12 support vector machines
12 support vector machines
TanmayVijay1
 
11 ml system design
11 ml system design11 ml system design
11 ml system design
TanmayVijay1
 
10 advice for applying ml
10 advice for applying ml10 advice for applying ml
10 advice for applying ml
TanmayVijay1
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learning
TanmayVijay1
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
TanmayVijay1
 
3 linear algebra review
3 linear algebra review3 linear algebra review
3 linear algebra review
TanmayVijay1
 
Ad

Recently uploaded (20)

Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 

2 linear regression with one variable

  • 1. 2. Linear Regression with One Variable – or Univariate Linear Regression MODEL REPRESENTATION: ➢Other Notations: X = space of input values Y = space of output values Dataset = list of m training examples → (x(i) ,y(i) ); i = 1,2, . . . ,m Hypothesis Function: Function which is derived by feeding training data (Input and Output (Supervised) ) to the learning algorithm, which can then be used to predict o/p for new input data.
  • 2. ➢For a supervised problem: h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. COST FUNCTION: Can measure the accuracy of our hypothesis function by Choosing parameters of h(x) such that h(x) is close to y for data in the training set. Cost function is a minimization function:
  • 3. Here, we try to minimize the (1/2m) x (sum of squared differences b/w predicted value and actual value given in dataset). ➢ “J” is the cost function or squared error cost function or Mean squared error ➢(1/2m) is for averaging the squared difference. Minimize means, we try to find values of θ parameters such that the cost function is minimized. Cost ➔ An average difference of all the results of the hypothesis with inputs from x's and the actual output y's. ➢The mean is halved as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 1/2 term. COST FUNCTION INTUITION: Training set data is scattered on x-y plane. We try to draw a straight line through it. Goal → find best fitting line. Ideally, the line should pass through all the points of our training data set. In such a case, the value of “J” will be 0. ➢For simplicity: let
  • 4. For Θ=1 ➔ h(Θ)=x ➔ J(Θ)=0 For Θ=0.5 ➔ h(Θ)=0.5x ➔ J(Θ)=0.58 For Θ=0 ➔ h(Θ)=0 ➔ J(Θ)=2.3 ➢J(Θ) is the average of sqr of diff bw h(x) and y: ➢h(x) = predicted value at given training data ➢y = actual value of o/p in training data ➢Vertical lines represent the given difference ➢For different values of Θ, we try to minimize J(Θ) {error}, which occurs at Θ=1. Therefore, we choose Θ=1 as out best fitting curve: h(x)=x ≫ For more complex h(x) fxn like h(x) = Θ0 + Θ1x : we have to plot J(Θ0, Θ1) in 3D. As, for different combinations of Θ0 and Θ1, J can be different.
  • 5. These can be more easily represented using contour figures: A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points on the line. Graph b/w Θ0 and Θ1: these ellipses are the combinations of Θ0 and Θ1 for which value of J is same. Points other than on ellipses are also valid points, they also correspond to a unique value of J.
  • 6. ➢The best combination (one which minimizes J(Θ0, Θ1) ) of Θ0 and Θ1 lies around center of the innermost circle.  GRADIENT DESCENT: an algorithm to minimize J(Θ0, Θ1) ➢Start with some Θ0, Θ1 ➢Keep changing Θ0, Θ1 to reduce J until minimum is reached Here, we start at a value of Θ0, Θ1 and keep going down on J(Θ0, Θ1) curve until we reach a local minima. We will know that we have succeeded when our cost function is at the very bottom of the pits in our graph. ➢If we start at a diff value of Θ0, Θ1, we end having different minima.
  • 7. We are not graphing x and y itself, but the parameter range of our hypothesis function and the cost resulting from selecting a particular set of parameters. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down the cost function in the direction with the steepest descent. The size of each step is determined by the parameter α, which is called the learning rate. α = Learning rate SIMULTANEOUS UPDATE: first we calculate new value for both Θ0 and Θ1, then only we update their values. So order of execution of statements is: Here, if update Θ0 before actually calculating the value of new Θ1, the Θ0 used in equation of Θ1 will be new Θ0, not the one we wanted to minimize for.
  • 8. At each iteration j, one should simultaneously update the parameters .. Θ0, Θ1…Θn. Updating a specific parameter prior to calculating another one on the jth iteration would yield a wrong implementation. GRADIENT DESCENT INTUITION: for simplicity we only use one parameter: h(x) = Θ1.x →α is positive For a value of Θ1, if the slope of J(Θ) is positive: Θ1 decreases For negative slope of J(Θ): Θ1 increases θ1 eventually converges to its minimum. If the value of α is too small: gradient descent takes baby steps towards the min.
  • 9. If α is too large: gradient descent takes huge steps.. in such case the gradient descent may even overshoot the min if the diff b/w initial Θ and Θmin is less than the value of jump in Θ (α * derivative of J) and it may start going further and further from the min. ➢Therefore, we should adjust our parameter to ensure that the gradient descent algorithm converges in a reasonable time. ➢If the Θ is already at its local minimum, the slope will be 0.. thus Θ won’t change Even if the learning rate α is fixed, the slope gets smaller as we reach towards the minima.. so the steps automatically become smaller
  • 10. For a linear regression model: Derivative of J: For Θ0 – derivate wrt Θ0 For Θ1 – derivate wrt Θ1
  • 11. ➢For a linear regression model: the curve is always a convex curve (bowl shaped). It has only one optimum → global minima (assuming the learning rate α is not too large). ➢ In the contour curve: we start of with any value of Θ0 and Θ1 and then we minimize the J. ➢We approach the min as we reach towards the center. We start at an arbitrary value for Θ0 and Θ1:
  • 12. We start minimizing J(Θ0, Θ1) with our gradient descent algo: J is a complicated quadratic function The ellipses shown above are the contours of a quadratic function Batch Gradient Descent → Each step of gradient descent uses all training examples. The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.
  翻译: