SlideShare a Scribd company logo
Machine Learning Basics
Lecture 1: Linear Regression
Princeton University COS 495
Instructor: Yingyu Liang
Machine learning basics
What is machine learning?
• “A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T as measured by P, improves with experience
E.”
------- Machine Learning, Tom Mitchell, 1997
Example 1: image classification
Task: determine if the image is indoor or outdoor
Performance measure: probability of misclassification
Example 1: image classification
indoor outdoor
Experience/Data:
images with labels
Indoor
Example 1: image classification
• A few terminologies
• Training data: the images given for learning
• Test data: the images to be classified
• Binary classification: classify into two classes
Example 1: image classification (multi-class)
ImageNet figure borrowed from vision.standford.edu
Example 2: clustering images
Task: partition the images into 2 groups
Performance: similarities within groups
Data: a set of images
Example 2: clustering images
• A few terminologies
• Unlabeled data vs labeled data
• Supervised learning vs unsupervised learning
Math formulation
Color Histogram
Red Green Blue
Indoor 0
Feature vector: 𝑥𝑖
Label: 𝑦𝑖
Extract
features
Math formulation
Color Histogram
Red Green Blue
outdoor 1
Feature vector: 𝑥𝑗
Label: 𝑦𝑗
Extract
features
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) using training data
• s.t. 𝑓 correct on test data
What kind of functions?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data
Hypothesis class
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data
Connection between
training data and test data?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷
They have the same
distribution
i.i.d.: independently
identically distributed
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷
What kind of performance
measure?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Various loss functions
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)]
• Examples of loss functions:
• 0-1 loss: 𝑙 𝑓, 𝑥, 𝑦 = 𝕀[𝑓 𝑥 ≠ 𝑦] and 𝐿 𝑓 = Pr[𝑓 𝑥 ≠ 𝑦]
• 𝑙2 loss: 𝑙 𝑓, 𝑥, 𝑦 = [𝑓 𝑥 − 𝑦]2 and 𝐿 𝑓 = 𝔼[𝑓 𝑥 − 𝑦]2
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] How to use?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 that minimizes ෠
𝐿 𝑓 =
1
𝑛
σ𝑖=1
𝑛
𝑙(𝑓, 𝑥𝑖, 𝑦𝑖)
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)]
Empirical loss
Machine learning 1-2-3
• Collect data and extract features
• Build model: choose hypothesis class 𝓗 and loss function 𝑙
• Optimization: minimize the empirical loss
Wait…
• Why handcraft the feature vectors 𝑥, 𝑦?
• Can use prior knowledge to design suitable features
• Can computer learn the features on the raw images?
• Learn features directly on the raw images: Representation Learning
• Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial
Intelligence
Wait…
• Does MachineLearning-1-2-3 include all approaches?
• Include many but not all
• Our current focus will be MachineLearning-1-2-3
Example: Stock Market Prediction
2013 2014 2015 2016
Stock Market (Disclaimer: synthetic data/in another parallel universe)
Orange MacroHard Ackermann
Sliding window over time: serve as input 𝑥; non-i.i.d.
Linear regression
Real data: Prostate Cancer
by Stamey et al. (1989)
Figure borrowed from
The Elements of Statistical Learning
𝑦: prostate
specific antigen
(𝑥1, … , 𝑥8):
clinical measures
Linear regression
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠
𝐿 𝑓𝑤 =
1
𝑛
σ𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2
𝑙2 loss; also called mean
square error
Hypothesis class 𝓗
Linear regression: optimization
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠
𝐿 𝑓𝑤 =
1
𝑛
σ𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2
• Let 𝑋 be a matrix whose 𝑖-th row is 𝑥𝑖
𝑇
, 𝑦 be the vector 𝑦1, … , 𝑦𝑛
𝑇
෠
𝐿 𝑓𝑤 =
1
𝑛
෍
𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2 =
1
𝑛
⃦𝑋𝑤 − 𝑦 ⃦2
2
Linear regression: optimization
• Set the gradient to 0 to get the minimizer
𝛻𝑤
෠
𝐿 𝑓𝑤 = 𝛻𝑤
1
𝑛
⃦𝑋𝑤 − 𝑦 ⃦2
2
= 0
𝛻𝑤[ 𝑋𝑤 − 𝑦 𝑇(𝑋𝑤 − 𝑦)] = 0
𝛻𝑤[ 𝑤𝑇𝑋𝑇𝑋𝑤 − 2𝑤𝑇𝑋𝑇𝑦 + 𝑦𝑇𝑦] = 0
2𝑋𝑇𝑋𝑤 − 2𝑋𝑇𝑦 = 0
w = 𝑋𝑇𝑋 −1𝑋𝑇𝑦
Linear regression: optimization
• Algebraic view of the minimizer
• If 𝑋 is invertible, just solve 𝑋𝑤 = 𝑦 and get 𝑤 = 𝑋−1𝑦
• But typically 𝑋 is a tall matrix
𝑋
𝑤
=
𝑦
𝑋𝑇
𝑋 𝑤
=
𝑋𝑇
𝑦
Normal equation: w = 𝑋𝑇
𝑋 −1
𝑋𝑇
𝑦
Linear regression with bias
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇𝑥 + 𝑏 to minimize the loss
• Reduce to the case without bias:
• Let 𝑤′
= 𝑤; 𝑏 , 𝑥′
= 𝑥; 1
• Then 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇
𝑥 + 𝑏 = 𝑤′ 𝑇
(𝑥′
)
Bias term
Ad

More Related Content

Similar to ML_basics_lecture1_linear_regression.pdf (20)

مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
Fundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling LecFundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling Lec
RBeze58
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
Fwdays
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
Pratik Gohel
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
NeurIPS22.pptx
NeurIPS22.pptxNeurIPS22.pptx
NeurIPS22.pptx
Julián Tachella
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
Zhuyi Xue
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
deeplearninhg........ applicationsWEEK 05.pdf
deeplearninhg........ applicationsWEEK 05.pdfdeeplearninhg........ applicationsWEEK 05.pdf
deeplearninhg........ applicationsWEEK 05.pdf
krishnas665013
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
lec02-DecisionTreed. Checking primality of an integer n .pdf
lec02-DecisionTreed. Checking primality of an integer n .pdflec02-DecisionTreed. Checking primality of an integer n .pdf
lec02-DecisionTreed. Checking primality of an integer n .pdf
ahmedghannam12
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
MohamedAliHabib3
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1week
csl9496
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
Fundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling LecFundamentals of Data Science Modeling Lec
Fundamentals of Data Science Modeling Lec
RBeze58
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
Fwdays
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
Pratik Gohel
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
Zhuyi Xue
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
deeplearninhg........ applicationsWEEK 05.pdf
deeplearninhg........ applicationsWEEK 05.pdfdeeplearninhg........ applicationsWEEK 05.pdf
deeplearninhg........ applicationsWEEK 05.pdf
krishnas665013
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
 
lec02-DecisionTreed. Checking primality of an integer n .pdf
lec02-DecisionTreed. Checking primality of an integer n .pdflec02-DecisionTreed. Checking primality of an integer n .pdf
lec02-DecisionTreed. Checking primality of an integer n .pdf
ahmedghannam12
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1week
csl9496
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 

More from Tigabu Yaya (20)

Deep Learning and types Convolutional Neural Network
Deep Learning and types Convolutional Neural NetworkDeep Learning and types Convolutional Neural Network
Deep Learning and types Convolutional Neural Network
Tigabu Yaya
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf
Tigabu Yaya
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdf
Tigabu Yaya
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdf
Tigabu Yaya
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdf
Tigabu Yaya
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
Tigabu Yaya
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf
Tigabu Yaya
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
Tigabu Yaya
 
lecture6.pdf
lecture6.pdflecture6.pdf
lecture6.pdf
Tigabu Yaya
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
Tigabu Yaya
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
Tigabu Yaya
 
lecture3.pdf
lecture3.pdflecture3.pdf
lecture3.pdf
Tigabu Yaya
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
Tigabu Yaya
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
Tigabu Yaya
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.ppt
Tigabu Yaya
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.ppt
Tigabu Yaya
 
nnfl.0620.pptx
nnfl.0620.pptxnnfl.0620.pptx
nnfl.0620.pptx
Tigabu Yaya
 
L20.ppt
L20.pptL20.ppt
L20.ppt
Tigabu Yaya
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdf
Tigabu Yaya
 
Deep Learning and types Convolutional Neural Network
Deep Learning and types Convolutional Neural NetworkDeep Learning and types Convolutional Neural Network
Deep Learning and types Convolutional Neural Network
Tigabu Yaya
 
03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf
Tigabu Yaya
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdf
Tigabu Yaya
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdf
Tigabu Yaya
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdf
Tigabu Yaya
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
Tigabu Yaya
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf
Tigabu Yaya
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.ppt
Tigabu Yaya
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.ppt
Tigabu Yaya
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdf
Tigabu Yaya
 
Ad

Recently uploaded (20)

All About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdfAll About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdf
TechSoup
 
Cultivation Practice of Onion in Nepal.pptx
Cultivation Practice of Onion in Nepal.pptxCultivation Practice of Onion in Nepal.pptx
Cultivation Practice of Onion in Nepal.pptx
UmeshTimilsina1
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Association for Project Management
 
Rock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian HistoryRock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian History
Virag Sontakke
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
The History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptxThe History of Kashmir Karkota Dynasty NEP.pptx
The History of Kashmir Karkota Dynasty NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
Overview Well-Being and Creative Careers
Overview Well-Being and Creative CareersOverview Well-Being and Creative Careers
Overview Well-Being and Creative Careers
University of Amsterdam
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Ajanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of HistoryAjanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of History
Virag Sontakke
 
Drugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdfDrugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdf
crewot855
 
*"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"**"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"*
Arshad Shaikh
 
LDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDMMIA Reiki News Ed3 Vol1 For Team and GuestsLDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDM Mia eStudios
 
All About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdfAll About the 990 Unlocking Its Mysteries and Its Power.pdf
All About the 990 Unlocking Its Mysteries and Its Power.pdf
TechSoup
 
Cultivation Practice of Onion in Nepal.pptx
Cultivation Practice of Onion in Nepal.pptxCultivation Practice of Onion in Nepal.pptx
Cultivation Practice of Onion in Nepal.pptx
UmeshTimilsina1
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...Transform tomorrow: Master benefits analysis with Gen AI today webinar,  30 A...
Transform tomorrow: Master benefits analysis with Gen AI today webinar, 30 A...
Association for Project Management
 
Rock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian HistoryRock Art As a Source of Ancient Indian History
Rock Art As a Source of Ancient Indian History
Virag Sontakke
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
Overview Well-Being and Creative Careers
Overview Well-Being and Creative CareersOverview Well-Being and Creative Careers
Overview Well-Being and Creative Careers
University of Amsterdam
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Ajanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of HistoryAjanta Paintings: Study as a Source of History
Ajanta Paintings: Study as a Source of History
Virag Sontakke
 
Drugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdfDrugs in Anaesthesia and Intensive Care,.pdf
Drugs in Anaesthesia and Intensive Care,.pdf
crewot855
 
*"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"**"Sensing the World: Insect Sensory Systems"*
*"Sensing the World: Insect Sensory Systems"*
Arshad Shaikh
 
LDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDMMIA Reiki News Ed3 Vol1 For Team and GuestsLDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDMMIA Reiki News Ed3 Vol1 For Team and Guests
LDM Mia eStudios
 
Ad

ML_basics_lecture1_linear_regression.pdf

  • 1. Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang
  • 3. What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning, Tom Mitchell, 1997
  • 4. Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification
  • 5. Example 1: image classification indoor outdoor Experience/Data: images with labels Indoor
  • 6. Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes
  • 7. Example 1: image classification (multi-class) ImageNet figure borrowed from vision.standford.edu
  • 8. Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images
  • 9. Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning
  • 10. Math formulation Color Histogram Red Green Blue Indoor 0 Feature vector: 𝑥𝑖 Label: 𝑦𝑖 Extract features
  • 11. Math formulation Color Histogram Red Green Blue outdoor 1 Feature vector: 𝑥𝑗 Label: 𝑦𝑗 Extract features
  • 12. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) using training data • s.t. 𝑓 correct on test data What kind of functions?
  • 13. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data Hypothesis class
  • 14. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data Connection between training data and test data?
  • 15. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷 They have the same distribution i.i.d.: independently identically distributed
  • 16. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷 What kind of performance measure?
  • 17. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Various loss functions
  • 18. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] • Examples of loss functions: • 0-1 loss: 𝑙 𝑓, 𝑥, 𝑦 = 𝕀[𝑓 𝑥 ≠ 𝑦] and 𝐿 𝑓 = Pr[𝑓 𝑥 ≠ 𝑦] • 𝑙2 loss: 𝑙 𝑓, 𝑥, 𝑦 = [𝑓 𝑥 − 𝑦]2 and 𝐿 𝑓 = 𝔼[𝑓 𝑥 − 𝑦]2
  • 19. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] How to use?
  • 20. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 that minimizes ෠ 𝐿 𝑓 = 1 𝑛 σ𝑖=1 𝑛 𝑙(𝑓, 𝑥𝑖, 𝑦𝑖) • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Empirical loss
  • 21. Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 𝓗 and loss function 𝑙 • Optimization: minimize the empirical loss
  • 22. Wait… • Why handcraft the feature vectors 𝑥, 𝑦? • Can use prior knowledge to design suitable features • Can computer learn the features on the raw images? • Learn features directly on the raw images: Representation Learning • Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial Intelligence
  • 23. Wait… • Does MachineLearning-1-2-3 include all approaches? • Include many but not all • Our current focus will be MachineLearning-1-2-3
  • 24. Example: Stock Market Prediction 2013 2014 2015 2016 Stock Market (Disclaimer: synthetic data/in another parallel universe) Orange MacroHard Ackermann Sliding window over time: serve as input 𝑥; non-i.i.d.
  • 26. Real data: Prostate Cancer by Stamey et al. (1989) Figure borrowed from The Elements of Statistical Learning 𝑦: prostate specific antigen (𝑥1, … , 𝑥8): clinical measures
  • 27. Linear regression • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠ 𝐿 𝑓𝑤 = 1 𝑛 σ𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 𝑙2 loss; also called mean square error Hypothesis class 𝓗
  • 28. Linear regression: optimization • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠ 𝐿 𝑓𝑤 = 1 𝑛 σ𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 • Let 𝑋 be a matrix whose 𝑖-th row is 𝑥𝑖 𝑇 , 𝑦 be the vector 𝑦1, … , 𝑦𝑛 𝑇 ෠ 𝐿 𝑓𝑤 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 = 1 𝑛 ⃦𝑋𝑤 − 𝑦 ⃦2 2
  • 29. Linear regression: optimization • Set the gradient to 0 to get the minimizer 𝛻𝑤 ෠ 𝐿 𝑓𝑤 = 𝛻𝑤 1 𝑛 ⃦𝑋𝑤 − 𝑦 ⃦2 2 = 0 𝛻𝑤[ 𝑋𝑤 − 𝑦 𝑇(𝑋𝑤 − 𝑦)] = 0 𝛻𝑤[ 𝑤𝑇𝑋𝑇𝑋𝑤 − 2𝑤𝑇𝑋𝑇𝑦 + 𝑦𝑇𝑦] = 0 2𝑋𝑇𝑋𝑤 − 2𝑋𝑇𝑦 = 0 w = 𝑋𝑇𝑋 −1𝑋𝑇𝑦
  • 30. Linear regression: optimization • Algebraic view of the minimizer • If 𝑋 is invertible, just solve 𝑋𝑤 = 𝑦 and get 𝑤 = 𝑋−1𝑦 • But typically 𝑋 is a tall matrix 𝑋 𝑤 = 𝑦 𝑋𝑇 𝑋 𝑤 = 𝑋𝑇 𝑦 Normal equation: w = 𝑋𝑇 𝑋 −1 𝑋𝑇 𝑦
  • 31. Linear regression with bias • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇𝑥 + 𝑏 to minimize the loss • Reduce to the case without bias: • Let 𝑤′ = 𝑤; 𝑏 , 𝑥′ = 𝑥; 1 • Then 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇 𝑥 + 𝑏 = 𝑤′ 𝑇 (𝑥′ ) Bias term
  翻译: