SlideShare a Scribd company logo
WHAT IS
MACHINE LEARNING
Dr. Majid Ali Khan
Dr. Ghazanfar Latif
OUTLINES
What is Machine Learning and Why is it useful?
Applications of Machine Learning
Types of Machine Learning
Challenges of Machine Learning
Testing and Validating
OUTSIDERS VIEW OF MACHINE LEARNING
Intelligent robots (good or bad) roaming the world!!!
REAL WORLD MACHINE LEARNING
Spam Filters
Optical Character Recognition
Image Processing
Manufacturing
Civil
Mechanical
Finance
WHAT IS MACHINE LEARNING
The science (and art) of programming computers so that they can learn from data
[Machine Learning is the] field of study that gives computers the ability to learn without being
explicitly programmed.
—Arthur Samuel, 1959
A computer program is said to learn from experience E with respect to some task T and some
performance measure P, if its performance on T, as measured by P, improves with experience E.
-Tom Mitchell, 1997
WHAT IS MACHINE LEARNING – AN EXAMPLE
Spam Filter:
 Differentiate between spam emails from the regular (non-spam) emails
What is the Experience (E):
 Training set (examples used to learn)
 Training instance (one particular training example)
What is the Task (T):
 Identify spam emails
What is the performance measure (P):
 How accurate is the identification (carried out using test set)
 Accuracy= Number of Correct Classification/Total size of test set
AI VS MACHINE LEARNING VS. DEEP LEARNING
WHY USE MACHINE LEARNING
Spam filter using traditional programing
WHY USE MACHINE LEARNING
Spam Filter using Machine Learning approach
WHY USE MACHINE LEARNING
Machine Learning can adapt to changing environment
WHY USE MACHINE LEARNING
Machine Learning can help humans better understand large data
IN SUMMARY: WHY USE ML
Use for problems for which existing solutions require a lot of fine-tuning or
long lists of rules: one Machine Learning algorithm can often simplify code and
perform better than the traditional approach.
Fluctuating environments: a Machine Learning system can adapt to new data.
Complex problems for which using a traditional approach yields no good
solution: the best Machine Learning techniques can perhaps find a solution.
Getting insights about complex problems and large amounts of data.
EXAMPLES OF APPLICATIONS
Analyzing images of products on a production line to automatically classify them (Image Classification using CNN)
Detecting tumors in brain scans (Semantic Segmentation)
Automatically classifying news articles (Text Classification)
Automatically flagging offensive comments on discussion forums (Text Classification)
Summarizing long documents automatically (Text Summarization)
Creating a chatbot or a personal assistant (Natural Language Processing)
Forecasting your company’s revenue next year, based on many performance metrics (Regression)
Making your app react to voice commands (Speech Recognition)
Detecting credit card fraud (Anomaly Detection)
Segmenting clients based on their purchases so that you can design a different marketing strategy for each segment
(Clustering)
Representing a complex, high-dimensional dataset in a clear and insightful diagram (Data Visualization)
Recommending a product that a client may be interested in, based on past purchases (Recommender Systems)
Building an intelligent bot for a game (Reinforcement Learning)
TYPES OF MACHINE LEARNING SYSTEMS
Whether or not they are trained with human supervision
 Supervised
 Unsupervised
 Semi-supervised
 Reinforcement Learning
Whether or not they can learn incrementally on the fly:
 Online Learning
 Batch Learning
SUPERVISED LEARNING
EXAMPLES OF SUPERVISED LEARNING
ALGORITHMS
k-Nearest Neighbors
Linear Regression
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees and Random Forests
Neural Networks
UNSUPERVISED LEARNING
EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS
Clustering
 K-Means
 DBSCAN
 Hierarchical Cluster Analysis (HCA)
EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS
Visualization and dimensionality reduction
 Principal Component Analysis (PCA)
 Kernel PCA
 Locally Linear Embedding (LLE)
 t-Distributed Stochastic Neighbor Embedding (t-SNE)
EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS
Anomaly detection and novelty detection
 One-class SVM
 Isolation Forest
EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS
Association rule learning
 Apriori
 Eclat
EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS
Clustering
 K-Means
 DBSCAN
 Hierarchical Cluster Analysis (HCA)
Anomaly detection and novelty detection
 One-class SVM
 Isolation Forest
Visualization and dimensionality reduction
 Principal Component Analysis (PCA)
 Kernel PCA
 Locally Linear Embedding (LLE)
 t-Distributed Stochastic Neighbor Embedding (t-SNE)
Association rule learning
 Apriori
 Eclat
SEMI-SUPERVISED LEARNING
REINFORCEMENT LEARNING
BATCH AND ONLINE LEARNING
Batch Learning:
 Learn in one go using all available training dataset
 Learning can not be done incrementally
 Requires to train the model from scratch again with an updated dataset
 Requires lots of computational resources
 But the process can be automated, so for small dataset it is not a huge concern
Online Learning:
 Learn on the fly with incoming data
 Learning can be done incrementally
 Does not require to keep all the data available all the time
CHALLENGES OF MACHINE LEARNING
Insufficient Quantity of Training Data
Non-representative Training Data
Poor Quality Data
Irrelevant Features
Overfitting Training Data
Underfitting Training Data
UNREASONABLE EFFECTIVENESS OF DATA
Algorithms performed similarly with
enough data
NON-REPRESENTATIVE DATA
POOR QUALITY DATA
 Error in data gathering
 Outliers
 Noise (Inaccurate measurements)
If some instances are clearly outliers, it may help to simply discard them or try to
fix the errors manually.
If some instances are missing a few features (e.g., 5% of your customers did not
specify their age), you must decide whether you want to ignore this attribute
altogether, ignore these instances, fill in the missing values (e.g., with the median
age), or train one model with the feature and one model without it.
IRRELEVANT FEATURES
Some features are not as useful in building the prediction model
 Feature Selection: Select features that matter
Feature Extraction: Extract new features based on existing features
OVERFITTING
Constraining a model to make it simpler and reduce the risk of overfitting is called
regularization.
UNDERFITTING
Opposite of overfitting.
The Machine Learning model is not able to learn properly from the data
Solutions:
 Select a more powerful model, with more parameters.
 Feed better features to the learning algorithm (feature engineering).
 Reduce the constraints on the model (e.g., reduce the regularization
hyperparameter).
TESTING
Split data into training and test set (common to use 80%-20% ratio)
Build the model on the training data
Test the model on test data
If training error is high, it means the model is not generalizing well (underfitting)
If the training error is low but testing error is high it means the model is not
generalizing to test data (overfitting)
VALIDATION
What if you have to compare different models or optimize your model on different
parameters
Should you just keep using test data for identifying generlaization error?
Doing so would cause the model to adapt to test data and not generalize well
Solution:
Divide data into training, validation and test data (possibly 60%, 20%, 20%)
Train models on training data and check error on validation data
Select model that minimizes the validation error
Then do one final training on training + validation data and test on test data
Ad

More Related Content

Similar to Chapter8_What_Is_Machine_Learning Testing Cases (20)

Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
ArtemSunfun
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
Johnson Ubah
 
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptxMachine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
NsitTech
 
Machine learning
Machine learningMachine learning
Machine learning
Sandeep Singh
 
MACHINE LEARNING MODELS. pptx
MACHINE LEARNING MODELS.             pptxMACHINE LEARNING MODELS.             pptx
MACHINE LEARNING MODELS. pptx
iamayesha2526
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
Eng Teong Cheah
 
machine learning types methods classification regression decision tree
machine learning types methods classification regression decision treemachine learning types methods classification regression decision tree
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
Overview of machine learning
Overview of machine learningOverview of machine learning
Overview of machine learning
AhmedHany131
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learning
drcfetr
 
An overview of machine learning (1)
An overview of machine learning (1)An overview of machine learning (1)
An overview of machine learning (1)
Pranjal Tiwari
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
nep_test_account
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
Dr. Abdul Ahad Abro
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
Jetender Sharma
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
Sunil Chomal
 
construire modele machine_Learning.pptx
construire modele  machine_Learning.pptxconstruire modele  machine_Learning.pptx
construire modele machine_Learning.pptx
koooragoal20000
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
Agile Testing Alliance
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
ArtemSunfun
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
Johnson Ubah
 
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptxMachine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
NsitTech
 
MACHINE LEARNING MODELS. pptx
MACHINE LEARNING MODELS.             pptxMACHINE LEARNING MODELS.             pptx
MACHINE LEARNING MODELS. pptx
iamayesha2526
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
Eng Teong Cheah
 
machine learning types methods classification regression decision tree
machine learning types methods classification regression decision treemachine learning types methods classification regression decision tree
machine learning types methods classification regression decision tree
drmohamadaboutaam
 
Overview of machine learning
Overview of machine learningOverview of machine learning
Overview of machine learning
AhmedHany131
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learning
drcfetr
 
An overview of machine learning (1)
An overview of machine learning (1)An overview of machine learning (1)
An overview of machine learning (1)
Pranjal Tiwari
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
nep_test_account
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
Dr. Abdul Ahad Abro
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
Jetender Sharma
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
Sunil Chomal
 
construire modele machine_Learning.pptx
construire modele  machine_Learning.pptxconstruire modele  machine_Learning.pptx
construire modele machine_Learning.pptx
koooragoal20000
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
Agile Testing Alliance
 

More from Ghazanfar Latif (Gabe) (17)

File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
Ghazanfar Latif (Gabe)
 
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Ghazanfar Latif (Gabe)
 
savitchAbsJavaPPT Java Programming Part 1
savitchAbsJavaPPT Java Programming Part 1savitchAbsJavaPPT Java Programming Part 1
savitchAbsJavaPPT Java Programming Part 1
Ghazanfar Latif (Gabe)
 
Chapter09 Unsupervised Learning Testing Cases
Chapter09 Unsupervised Learning Testing CasesChapter09 Unsupervised Learning Testing Cases
Chapter09 Unsupervised Learning Testing Cases
Ghazanfar Latif (Gabe)
 
K-means Clustering Algorithm Testing Cases
K-means Clustering Algorithm Testing CasesK-means Clustering Algorithm Testing Cases
K-means Clustering Algorithm Testing Cases
Ghazanfar Latif (Gabe)
 
What is Interaction Design?
What is Interaction Design?What is Interaction Design?
What is Interaction Design?
Ghazanfar Latif (Gabe)
 
White rabbit game cloud deployment architecture
White rabbit game cloud deployment architectureWhite rabbit game cloud deployment architecture
White rabbit game cloud deployment architecture
Ghazanfar Latif (Gabe)
 
Svm on cloud (presntation)
Svm on cloud  (presntation)Svm on cloud  (presntation)
Svm on cloud (presntation)
Ghazanfar Latif (Gabe)
 
Security enabling at amazon cloud (presntation)
Security enabling at amazon cloud  (presntation)Security enabling at amazon cloud  (presntation)
Security enabling at amazon cloud (presntation)
Ghazanfar Latif (Gabe)
 
Mtbc cloud ehr
Mtbc cloud ehrMtbc cloud ehr
Mtbc cloud ehr
Ghazanfar Latif (Gabe)
 
Effective use of amazon web services for web deployment
Effective use of amazon web services for web deploymentEffective use of amazon web services for web deployment
Effective use of amazon web services for web deployment
Ghazanfar Latif (Gabe)
 
A L A Q S A
A L A Q S AA L A Q S A
A L A Q S A
Ghazanfar Latif (Gabe)
 
Areyouap
AreyouapAreyouap
Areyouap
Ghazanfar Latif (Gabe)
 
Attitude Fyh 02 P R E E T R A N J A N
Attitude Fyh 02 P R E E T R A N J A NAttitude Fyh 02 P R E E T R A N J A N
Attitude Fyh 02 P R E E T R A N J A N
Ghazanfar Latif (Gabe)
 
Technical Report Writing Presentation
Technical Report Writing PresentationTechnical Report Writing Presentation
Technical Report Writing Presentation
Ghazanfar Latif (Gabe)
 
Outreach Scholarship Program for Hiegher Education in Pakistan
Outreach Scholarship Program for Hiegher Education in PakistanOutreach Scholarship Program for Hiegher Education in Pakistan
Outreach Scholarship Program for Hiegher Education in Pakistan
Ghazanfar Latif (Gabe)
 
Semantic Web Technologies Presenattion (Topic: TripIt)
Semantic Web Technologies Presenattion (Topic: TripIt)Semantic Web Technologies Presenattion (Topic: TripIt)
Semantic Web Technologies Presenattion (Topic: TripIt)
Ghazanfar Latif (Gabe)
 
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
File_System_Fundamentals savitchAbsJavaPPT Java Programming Part 2
Ghazanfar Latif (Gabe)
 
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Chap09_Virtual Memory File_System_Fundamentals savitchAbsJavaPPT Java Program...
Ghazanfar Latif (Gabe)
 
savitchAbsJavaPPT Java Programming Part 1
savitchAbsJavaPPT Java Programming Part 1savitchAbsJavaPPT Java Programming Part 1
savitchAbsJavaPPT Java Programming Part 1
Ghazanfar Latif (Gabe)
 
Chapter09 Unsupervised Learning Testing Cases
Chapter09 Unsupervised Learning Testing CasesChapter09 Unsupervised Learning Testing Cases
Chapter09 Unsupervised Learning Testing Cases
Ghazanfar Latif (Gabe)
 
K-means Clustering Algorithm Testing Cases
K-means Clustering Algorithm Testing CasesK-means Clustering Algorithm Testing Cases
K-means Clustering Algorithm Testing Cases
Ghazanfar Latif (Gabe)
 
White rabbit game cloud deployment architecture
White rabbit game cloud deployment architectureWhite rabbit game cloud deployment architecture
White rabbit game cloud deployment architecture
Ghazanfar Latif (Gabe)
 
Security enabling at amazon cloud (presntation)
Security enabling at amazon cloud  (presntation)Security enabling at amazon cloud  (presntation)
Security enabling at amazon cloud (presntation)
Ghazanfar Latif (Gabe)
 
Effective use of amazon web services for web deployment
Effective use of amazon web services for web deploymentEffective use of amazon web services for web deployment
Effective use of amazon web services for web deployment
Ghazanfar Latif (Gabe)
 
Outreach Scholarship Program for Hiegher Education in Pakistan
Outreach Scholarship Program for Hiegher Education in PakistanOutreach Scholarship Program for Hiegher Education in Pakistan
Outreach Scholarship Program for Hiegher Education in Pakistan
Ghazanfar Latif (Gabe)
 
Semantic Web Technologies Presenattion (Topic: TripIt)
Semantic Web Technologies Presenattion (Topic: TripIt)Semantic Web Technologies Presenattion (Topic: TripIt)
Semantic Web Technologies Presenattion (Topic: TripIt)
Ghazanfar Latif (Gabe)
 
Ad

Recently uploaded (20)

How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
IPL QUIZ | THE QUIZ CLUB OF PSGCAS | 2025.pdf
IPL QUIZ | THE QUIZ CLUB OF PSGCAS | 2025.pdfIPL QUIZ | THE QUIZ CLUB OF PSGCAS | 2025.pdf
IPL QUIZ | THE QUIZ CLUB OF PSGCAS | 2025.pdf
Quiz Club of PSG College of Arts & Science
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
MEDICAL BIOLOGY MCQS BY. DR NASIR MUSTAFA
MEDICAL BIOLOGY MCQS  BY. DR NASIR MUSTAFAMEDICAL BIOLOGY MCQS  BY. DR NASIR MUSTAFA
MEDICAL BIOLOGY MCQS BY. DR NASIR MUSTAFA
Dr. Nasir Mustafa
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)
Mohamed Rizk Khodair
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon DolabaniHistory Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
History Of The Monastery Of Mor Gabriel Philoxenos Yuhanon Dolabani
fruinkamel7m
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
libbys peer assesment.docx..............
libbys peer assesment.docx..............libbys peer assesment.docx..............
libbys peer assesment.docx..............
19lburrell
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
MEDICAL BIOLOGY MCQS BY. DR NASIR MUSTAFA
MEDICAL BIOLOGY MCQS  BY. DR NASIR MUSTAFAMEDICAL BIOLOGY MCQS  BY. DR NASIR MUSTAFA
MEDICAL BIOLOGY MCQS BY. DR NASIR MUSTAFA
Dr. Nasir Mustafa
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
puzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tensepuzzle Irregular Verbs- Simple Past Tense
puzzle Irregular Verbs- Simple Past Tense
OlgaLeonorTorresSnch
 
spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)spinal cord disorders (Myelopathies and radiculoapthies)
spinal cord disorders (Myelopathies and radiculoapthies)
Mohamed Rizk Khodair
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Ad

Chapter8_What_Is_Machine_Learning Testing Cases

  • 1. WHAT IS MACHINE LEARNING Dr. Majid Ali Khan Dr. Ghazanfar Latif
  • 2. OUTLINES What is Machine Learning and Why is it useful? Applications of Machine Learning Types of Machine Learning Challenges of Machine Learning Testing and Validating
  • 3. OUTSIDERS VIEW OF MACHINE LEARNING Intelligent robots (good or bad) roaming the world!!!
  • 4. REAL WORLD MACHINE LEARNING Spam Filters Optical Character Recognition Image Processing Manufacturing Civil Mechanical Finance
  • 5. WHAT IS MACHINE LEARNING The science (and art) of programming computers so that they can learn from data [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959 A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. -Tom Mitchell, 1997
  • 6. WHAT IS MACHINE LEARNING – AN EXAMPLE Spam Filter:  Differentiate between spam emails from the regular (non-spam) emails What is the Experience (E):  Training set (examples used to learn)  Training instance (one particular training example) What is the Task (T):  Identify spam emails What is the performance measure (P):  How accurate is the identification (carried out using test set)  Accuracy= Number of Correct Classification/Total size of test set
  • 7. AI VS MACHINE LEARNING VS. DEEP LEARNING
  • 8. WHY USE MACHINE LEARNING Spam filter using traditional programing
  • 9. WHY USE MACHINE LEARNING Spam Filter using Machine Learning approach
  • 10. WHY USE MACHINE LEARNING Machine Learning can adapt to changing environment
  • 11. WHY USE MACHINE LEARNING Machine Learning can help humans better understand large data
  • 12. IN SUMMARY: WHY USE ML Use for problems for which existing solutions require a lot of fine-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better than the traditional approach. Fluctuating environments: a Machine Learning system can adapt to new data. Complex problems for which using a traditional approach yields no good solution: the best Machine Learning techniques can perhaps find a solution. Getting insights about complex problems and large amounts of data.
  • 13. EXAMPLES OF APPLICATIONS Analyzing images of products on a production line to automatically classify them (Image Classification using CNN) Detecting tumors in brain scans (Semantic Segmentation) Automatically classifying news articles (Text Classification) Automatically flagging offensive comments on discussion forums (Text Classification) Summarizing long documents automatically (Text Summarization) Creating a chatbot or a personal assistant (Natural Language Processing) Forecasting your company’s revenue next year, based on many performance metrics (Regression) Making your app react to voice commands (Speech Recognition) Detecting credit card fraud (Anomaly Detection) Segmenting clients based on their purchases so that you can design a different marketing strategy for each segment (Clustering) Representing a complex, high-dimensional dataset in a clear and insightful diagram (Data Visualization) Recommending a product that a client may be interested in, based on past purchases (Recommender Systems) Building an intelligent bot for a game (Reinforcement Learning)
  • 14. TYPES OF MACHINE LEARNING SYSTEMS Whether or not they are trained with human supervision  Supervised  Unsupervised  Semi-supervised  Reinforcement Learning Whether or not they can learn incrementally on the fly:  Online Learning  Batch Learning
  • 16. EXAMPLES OF SUPERVISED LEARNING ALGORITHMS k-Nearest Neighbors Linear Regression Logistic Regression Support Vector Machines (SVMs) Decision Trees and Random Forests Neural Networks
  • 18. EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS Clustering  K-Means  DBSCAN  Hierarchical Cluster Analysis (HCA)
  • 19. EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS Visualization and dimensionality reduction  Principal Component Analysis (PCA)  Kernel PCA  Locally Linear Embedding (LLE)  t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • 20. EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS Anomaly detection and novelty detection  One-class SVM  Isolation Forest
  • 21. EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS Association rule learning  Apriori  Eclat
  • 22. EXAMPLES OF UNSUPERVISED LEARNING ALGORITHMS Clustering  K-Means  DBSCAN  Hierarchical Cluster Analysis (HCA) Anomaly detection and novelty detection  One-class SVM  Isolation Forest Visualization and dimensionality reduction  Principal Component Analysis (PCA)  Kernel PCA  Locally Linear Embedding (LLE)  t-Distributed Stochastic Neighbor Embedding (t-SNE) Association rule learning  Apriori  Eclat
  • 25. BATCH AND ONLINE LEARNING Batch Learning:  Learn in one go using all available training dataset  Learning can not be done incrementally  Requires to train the model from scratch again with an updated dataset  Requires lots of computational resources  But the process can be automated, so for small dataset it is not a huge concern Online Learning:  Learn on the fly with incoming data  Learning can be done incrementally  Does not require to keep all the data available all the time
  • 26. CHALLENGES OF MACHINE LEARNING Insufficient Quantity of Training Data Non-representative Training Data Poor Quality Data Irrelevant Features Overfitting Training Data Underfitting Training Data
  • 27. UNREASONABLE EFFECTIVENESS OF DATA Algorithms performed similarly with enough data
  • 29. POOR QUALITY DATA  Error in data gathering  Outliers  Noise (Inaccurate measurements) If some instances are clearly outliers, it may help to simply discard them or try to fix the errors manually. If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it.
  • 30. IRRELEVANT FEATURES Some features are not as useful in building the prediction model  Feature Selection: Select features that matter Feature Extraction: Extract new features based on existing features
  • 31. OVERFITTING Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.
  • 32. UNDERFITTING Opposite of overfitting. The Machine Learning model is not able to learn properly from the data Solutions:  Select a more powerful model, with more parameters.  Feed better features to the learning algorithm (feature engineering).  Reduce the constraints on the model (e.g., reduce the regularization hyperparameter).
  • 33. TESTING Split data into training and test set (common to use 80%-20% ratio) Build the model on the training data Test the model on test data If training error is high, it means the model is not generalizing well (underfitting) If the training error is low but testing error is high it means the model is not generalizing to test data (overfitting)
  • 34. VALIDATION What if you have to compare different models or optimize your model on different parameters Should you just keep using test data for identifying generlaization error? Doing so would cause the model to adapt to test data and not generalize well Solution: Divide data into training, validation and test data (possibly 60%, 20%, 20%) Train models on training data and check error on validation data Select model that minimizes the validation error Then do one final training on training + validation data and test on test data

Editor's Notes

  • #6: Your spam filter is a Machine Learning program that, given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails, can learn to flag spam. The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample). In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio of correctly classified emails. This particular performance measure is called accuracy, and it is often used in classification tasks.
  • #8: Step 1. First you would consider what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to come up a lot in the subject line. Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and other parts of the email. Step 2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns were detected. You would test your program and repeat steps 1 and 2 until it was good enough to launch. Since the problem is difficult, your program will likely become a long list of complex rules—pretty hard to maintain.
  • #9: In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the ham examples (Figure 1-2). The program is much shorter, easier to maintain, and most likely more accurate. What if spammers notice that all their emails containing “4U” are blocked? They might start writing “For U” instead. A spam filter using traditional programming techniques would need to be updated to flag “For U” emails. If spammers keep working around your spam filter, you will need to keep writing new rules forever.
  • #10: In contrast, a spam filter based on Machine Learning techniques automatically notices that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention (Figure 1-3). Another area where Machine Learning shines is for problems that either are too complex for traditional approaches or have no known algorithm. For example, consider speech recognition.
  • #11: Finally, Machine Learning can help humans learn (Figure 1-4). ML algorithms can be inspected to see what they have learned (although for some algorithms this can be tricky). For instance, once a spam filter has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or new trends, and thereby lead to a better understanding of the prob‐lem. Applying ML techniques to dig into large amounts of data can help discover pat‐terns that were not immediately apparent. This is called data mining.
  • #15: In supervised learning, the training set you feed to the algorithm includes the desired solutions, called labels (Figure 1-5). A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails. Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is called regression (Figure 1-6).1 To train the system, you need to give it many examples of cars, including both their predictors and their labels (i.e., their prices).
  • #17: In unsupervised learning, as you might guess, the training data is unlabeled (Figure 1-7). The system tries to learn without a teacher.
  • #18: For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors (Figure 1-8). At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help. For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends. If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.
  • #19: Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐resentation of your data that can easily be plotted (Figure 1-9). A related task is dimensionality reduction, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correla‐ted features into one. For example, a car’s mileage may be strongly correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.
  • #20: Yet another important unsupervised task is anomaly detection—for example, detect‐ing unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learn‐ing algorithm. The system is shown mostly normal instances during training, so it learns to recognize them; then, when it sees a new instance, it can tell whether it looks like a normal one or whether it is likely an anomaly (see Figure 1-10).
  • #21: Finally, another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to one another.
  • #22: For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors (Figure 1-8). At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help. For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends. If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group. A related task is dimensionality reduction, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correla‐ted features into one. For example, a car’s mileage may be strongly correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.
  • #23: Since labeling data is usually time-consuming and costly, you will often have plenty of unlabeled instances, and few labeled instances. Some algorithms can deal with data that’s partially labeled. This is called semisupervised learning (Figure 1-11). Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just add one label per person4 and it is able to name everyone in every photo, which is useful for searching photos.
  • #24: The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation. For example, many robots implement Reinforcement Learning algorithms to learn how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement Learning: it made the headlines in May 2017 when it beat the world champion Ke Jie at the game of Go. It learned its winning policy by analyzing millions of games, and then playing many games against itself. Note that learning was turned off during the games against the champion; AlphaGo was just applying the policy it had learned.
  • #25: In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.
  • #27: In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language disambiguation8 once they were given enough data (as you can see in Figure 1-20). As the authors put it, “these results suggest that we may want to reconsider the tradeoff between spending time and money on algorithm development versus spending it on corpus development.” The idea that data matters more than algorithms for complex problems was further popularized by Peter Norvig et al. in a paper titled “The Unreasonable Effectiveness of Data”, published in 2009.10 It should be noted, however, that small- and mediumsized datasets are still very common, and it is not always easy or cheap to get extra training data—so don’t abandon algorithms just yet.
  • #28: For example, the set of countries we used earlier for training the linear model was not perfectly representative; a few countries were missing. Figure 1-21 shows what the data looks like when you add the missing countries. If you train a linear model on this data, you get the solid line, while the old model is represented by the dotted line. As you can see, not only does adding a few missing countries significantly alter the model, but it makes it clear that such a simple linear model is probably never going to work well. It seems that very rich countries are not happier than moderately rich countries (in fact, they seem unhappier), and conversely some poor countries seem happier than many rich countries. By using a nonrepresentative training set, we trained a model that is unlikely to make accurate predictions, especially for very poor and very rich countries.
  • #29: Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that. The following are a couple of examples of when you’d want to clean up training data: • If some instances are clearly outliers, it may help to simply discard them or try to fix the errors manually. • If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it.
  • #31: Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and unfortunately machines can fall into the same trap if we are not careful. In Machine Learning this is called overfitting: it means that the model performs well on the training data, but it does not generalize well.
  • #33: It is common to use 80% of the data for training and hold out 20% for testing. However, this depends on the size of the dataset: if it contains 10 million instances, then holding out 1% means your test set will contain 100,000 instances, probably more than enough to get a good estimate of the generalization error.
  • #34: A common solution to this problem is called holdout validation: you simply hold out part of the training set to evaluate several candidate models and select the best one. The new held-out set is called the validation set (or sometimes the development set, or dev set). More specifically, you train multiple models with various hyperparameters on the reduced training set (i.e., the full training set minus the validation set), and you select the model that performs best on the validation set. After this holdout validation process, you train the best model on the full training set (including the validation set), and this gives you the final model. Lastly, you evaluate this final model on the test set to get an estimate of the generalization error.
  翻译: