SlideShare a Scribd company logo
Data mining
Assignment week 3




BARRY KOLLEE

10349863
Regression	
  |	
  CPU	
  performance	
  
	
  
Exercise 1: Decision Trees and Logical Forms
Imagine a scenario where you want to decide whether to provide a loan. Given
the following logical formula in disjunctive normal form, draw the corresponding
decision tree.


       ( age > 25 = no ^ lives_by_himself = no )
       ( age > 25 = yes ^ employed = yes )
       ( age > 25 = yes ^ employed = no ^ in_education = yes )




2
Regression	
  |	
  CPU	
  performance	
  
	
  
Exercise 2: Information Gain and Attribute Selection

Given the following training data:

Which attribute (i.e., a1 or a2) has the higher information
gain when chosen as the first branching in a decision tree?
Explain this first intuitively (you don't need a calculator for
this) and then explain it by giving the respective
information gains (you can use a calculator for this).



Observation:

I think that a1 has a higher information gain. That’s because I see an equal distribution at a2. I conclude
this by the following:

        •   50 % of all the instances has a ‘+’ class and the other 50 % has a ‘-‘ class.
        •   50 % of all the true values have a ‘+’ class and 50 % of all true values have a ‘-‘ class.
        •   50 % of all false values have a ‘+’ class and 50 % of all false values have a ‘-‘ class.

The information gain for class a1 becomes:


       H(a1, true) = -(1/3)log2(1/3) – (2/3)log2(2/3) = 0.9183
       H(a1, false) = -(1/3)log2(1/3) – (2/3)log2(2/3) = 0.9183
       H(a1)        = 0.5 * 0.9183 + 0.5 * 0.9183      = 0.9183


So eventually the gain of ‘a1’ will be:


       Gain(a1) = 1 – 0.9183 = 0.0817



The information gain for class a2 becomes:


       H(a2, true) = -(1/2)log2(1/2) – (1/2)log2(1/2) = 1
       H(a2, false = -(1/2)log2(1/2) – (1/2)log2(1/2) = 1
       H(a2)       = 0.5 * 1 + 0.5 * 1                = 1


So eventually the gain of ‘a2’ will be:


       Gain(a2) = 1 – 1 = 0




3
Regression	
  |	
  CPU	
  performance	
  
	
  
Exercise 3: Overfitting

Give a simple example of a decision tree and a data set where the decision tree
overfits the data. Show explicitly (see the definition of overfitting) that the
decision tree in your example overfits.

My example for showing overfitted data is in the column and decision tree below. Each and every animal
has a unique number (animalnumber). If this training dataset contains a lot of animals we can easily
search for the value ‘is_in_cage’ for every animal and eventually know if this animal is within a cage.
However if the zoo retrieves new animals we can’t state/predict if this animal should be in a cage. We
shouldn’t assume this.



Example of a (training) dataset of the zoo:

                         animalnumber       number_of_visits        is_in_cage
                           animal_1               200                   No
                           animal_2               400                  Yes
                           animal_3               100                   No
                           animal_n                ..                   ..

Example it’s decision tree:


       ( number_of_visits > 140 = yes is_in_cage = yes )




4
Ad

More Related Content

What's hot (19)

Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
Talha Kabakus
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
hoangminhdong
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
HARDIK SINGH
 
Icom4015 lecture4-f16
Icom4015 lecture4-f16Icom4015 lecture4-f16
Icom4015 lecture4-f16
BienvenidoVelezUPR
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
Brent Heard
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
Raman Kannan
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
Icom4015 lecture3-s18
Icom4015 lecture3-s18Icom4015 lecture3-s18
Icom4015 lecture3-s18
BienvenidoVelezUPR
 
27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties
Andres Mendez-Vazquez
 
Icom4015 lecture12-s16
Icom4015 lecture12-s16Icom4015 lecture12-s16
Icom4015 lecture12-s16
BienvenidoVelezUPR
 
Decreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umraoDecreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umrao
ssuserd6b1fd
 
Icom4015 lecture3-f17
Icom4015 lecture3-f17Icom4015 lecture3-f17
Icom4015 lecture3-f17
BienvenidoVelezUPR
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
Talha Kabakus
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
Brent Heard
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
Raman Kannan
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties27 Machine Learning Unsupervised Measure Properties
27 Machine Learning Unsupervised Measure Properties
Andres Mendez-Vazquez
 
Decreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umraoDecreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umrao
ssuserd6b1fd
 

Viewers also liked (20)

Data Engineering - Data Mining Assignment
Data Engineering - Data Mining AssignmentData Engineering - Data Mining Assignment
Data Engineering - Data Mining Assignment
Darran Mottershead
 
Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)
BarryK88
 
Wek1
Wek1Wek1
Wek1
Dr Anjan Krishnamurthy
 
HCI - Group Report for Metrolink App
HCI - Group Report for Metrolink AppHCI - Group Report for Metrolink App
HCI - Group Report for Metrolink App
Darran Mottershead
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
2014 Profile of Results
2014 Profile of Results2014 Profile of Results
2014 Profile of Results
Darran Mottershead
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
R A Akerkar
 
WEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow InterfaceWEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow Interface
weka Content
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
weka Content
 
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Triangle American Marketing Association
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
Mark Tabladillo
 
Loan Processing System
Loan Processing SystemLoan Processing System
Loan Processing System
tenlaclgt
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
Shareek Ahamed
 
rule-based classifier
rule-based classifierrule-based classifier
rule-based classifier
Sean Chiu
 
Text classification with Weka
Text classification with WekaText classification with Weka
Text classification with Weka
Milad Alshomary
 
Tutorial weka
Tutorial wekaTutorial weka
Tutorial weka
René Rojas Castillo
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
rsathishwaran
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentation
krampert
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144
Chathawee May
 
weka data mining
weka data mining weka data mining
weka data mining
kalthoom almaqbali
 
Data Engineering - Data Mining Assignment
Data Engineering - Data Mining AssignmentData Engineering - Data Mining Assignment
Data Engineering - Data Mining Assignment
Darran Mottershead
 
Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)
BarryK88
 
HCI - Group Report for Metrolink App
HCI - Group Report for Metrolink AppHCI - Group Report for Metrolink App
HCI - Group Report for Metrolink App
Darran Mottershead
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
R A Akerkar
 
WEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow InterfaceWEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow Interface
weka Content
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
weka Content
 
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Triangle American Marketing Association
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
Mark Tabladillo
 
Loan Processing System
Loan Processing SystemLoan Processing System
Loan Processing System
tenlaclgt
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
Shareek Ahamed
 
rule-based classifier
rule-based classifierrule-based classifier
rule-based classifier
Sean Chiu
 
Text classification with Weka
Text classification with WekaText classification with Weka
Text classification with Weka
Milad Alshomary
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
rsathishwaran
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentation
krampert
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144
Chathawee May
 
Ad

Similar to Data mining assignment 3 (20)

Association Rule Mining with Apriori Algorithm.pdf
Association Rule Mining with Apriori Algorithm.pdfAssociation Rule Mining with Apriori Algorithm.pdf
Association Rule Mining with Apriori Algorithm.pdf
A. S. M. Shafi
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 
Week8 livelecture2010 follow_up
Week8 livelecture2010 follow_upWeek8 livelecture2010 follow_up
Week8 livelecture2010 follow_up
Brent Heard
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
ajmal_fuuast
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Final examexamplesapr2013
Final examexamplesapr2013Final examexamplesapr2013
Final examexamplesapr2013
Brent Heard
 
Decision_Tree in machine learning with examples.ppt
Decision_Tree in machine learning with examples.pptDecision_Tree in machine learning with examples.ppt
Decision_Tree in machine learning with examples.ppt
amrita chaturvedi
 
Week 7 Lecture
Week 7 LectureWeek 7 Lecture
Week 7 Lecture
Brent Heard
 
Data Science-entropy machine learning.pptx
Data Science-entropy machine learning.pptxData Science-entropy machine learning.pptx
Data Science-entropy machine learning.pptx
ZainabShahzad9
 
Operations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use thisOperations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use this
POLY33
 
Python_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdfPython_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdf
sagar414433
 
Python_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdfPython_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdf
sagar414433
 
Ml presentation
Ml presentationMl presentation
Ml presentation
Mark Fetherolf
 
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdfLessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
mohawork486
 
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Akanksha Bali
 
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Sylvain Hallé
 
large scale Machine learning
large scale Machine learninglarge scale Machine learning
large scale Machine learning
Full Stack Developer at Electro Mizan Andisheh
 
Cours Stats 5E
Cours Stats 5ECours Stats 5E
Cours Stats 5E
PaulineKRUMM
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
Dr. Radhey Shyam
 
Association Rule Mining with Apriori Algorithm.pdf
Association Rule Mining with Apriori Algorithm.pdfAssociation Rule Mining with Apriori Algorithm.pdf
Association Rule Mining with Apriori Algorithm.pdf
A. S. M. Shafi
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 
Week8 livelecture2010 follow_up
Week8 livelecture2010 follow_upWeek8 livelecture2010 follow_up
Week8 livelecture2010 follow_up
Brent Heard
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
ajmal_fuuast
 
Final examexamplesapr2013
Final examexamplesapr2013Final examexamplesapr2013
Final examexamplesapr2013
Brent Heard
 
Decision_Tree in machine learning with examples.ppt
Decision_Tree in machine learning with examples.pptDecision_Tree in machine learning with examples.ppt
Decision_Tree in machine learning with examples.ppt
amrita chaturvedi
 
Data Science-entropy machine learning.pptx
Data Science-entropy machine learning.pptxData Science-entropy machine learning.pptx
Data Science-entropy machine learning.pptx
ZainabShahzad9
 
Operations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use thisOperations management chapter 03 homework assignment use this
Operations management chapter 03 homework assignment use this
POLY33
 
Python_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdfPython_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdf
sagar414433
 
Python_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdfPython_Cheat_Sheet_Keywords_1664634397.pdf
Python_Cheat_Sheet_Keywords_1664634397.pdf
sagar414433
 
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdfLessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
mohawork486
 
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, ...
Akanksha Bali
 
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Sylvain Hallé
 
Ad

More from BarryK88 (10)

Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)
BarryK88
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
BarryK88
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
BarryK88
 
Data mining assignment 1
Data mining assignment 1Data mining assignment 1
Data mining assignment 1
BarryK88
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2
BarryK88
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1
BarryK88
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
BarryK88
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
BarryK88
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
BarryK88
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
BarryK88
 
Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)
BarryK88
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
BarryK88
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
BarryK88
 
Data mining assignment 1
Data mining assignment 1Data mining assignment 1
Data mining assignment 1
BarryK88
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2
BarryK88
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1
BarryK88
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
BarryK88
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
BarryK88
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
BarryK88
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
BarryK88
 

Recently uploaded (20)

ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18
Celine George
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Look Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History EverywhereLook Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History Everywhere
History of Stoke Newington
 
INSULIN.pptx by Arka Das (Bsc. Critical care technology)
INSULIN.pptx by Arka Das (Bsc. Critical care technology)INSULIN.pptx by Arka Das (Bsc. Critical care technology)
INSULIN.pptx by Arka Das (Bsc. Critical care technology)
ArkaDas54
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Final Evaluation.docx...........................
Final Evaluation.docx...........................Final Evaluation.docx...........................
Final Evaluation.docx...........................
l1bbyburrell
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Redesigning Education as a Cognitive Ecosystem: Practical Insights into Emerg...
Leonel Morgado
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18How to Use Upgrade Code Command in Odoo 18
How to Use Upgrade Code Command in Odoo 18
Celine George
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Look Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History EverywhereLook Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History Everywhere
History of Stoke Newington
 
INSULIN.pptx by Arka Das (Bsc. Critical care technology)
INSULIN.pptx by Arka Das (Bsc. Critical care technology)INSULIN.pptx by Arka Das (Bsc. Critical care technology)
INSULIN.pptx by Arka Das (Bsc. Critical care technology)
ArkaDas54
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 

Data mining assignment 3

  • 1. Data mining Assignment week 3 BARRY KOLLEE 10349863
  • 2. Regression  |  CPU  performance     Exercise 1: Decision Trees and Logical Forms Imagine a scenario where you want to decide whether to provide a loan. Given the following logical formula in disjunctive normal form, draw the corresponding decision tree. ( age > 25 = no ^ lives_by_himself = no ) ( age > 25 = yes ^ employed = yes ) ( age > 25 = yes ^ employed = no ^ in_education = yes ) 2
  • 3. Regression  |  CPU  performance     Exercise 2: Information Gain and Attribute Selection Given the following training data: Which attribute (i.e., a1 or a2) has the higher information gain when chosen as the first branching in a decision tree? Explain this first intuitively (you don't need a calculator for this) and then explain it by giving the respective information gains (you can use a calculator for this). Observation: I think that a1 has a higher information gain. That’s because I see an equal distribution at a2. I conclude this by the following: • 50 % of all the instances has a ‘+’ class and the other 50 % has a ‘-‘ class. • 50 % of all the true values have a ‘+’ class and 50 % of all true values have a ‘-‘ class. • 50 % of all false values have a ‘+’ class and 50 % of all false values have a ‘-‘ class. The information gain for class a1 becomes: H(a1, true) = -(1/3)log2(1/3) – (2/3)log2(2/3) = 0.9183 H(a1, false) = -(1/3)log2(1/3) – (2/3)log2(2/3) = 0.9183 H(a1) = 0.5 * 0.9183 + 0.5 * 0.9183 = 0.9183 So eventually the gain of ‘a1’ will be: Gain(a1) = 1 – 0.9183 = 0.0817 The information gain for class a2 becomes: H(a2, true) = -(1/2)log2(1/2) – (1/2)log2(1/2) = 1 H(a2, false = -(1/2)log2(1/2) – (1/2)log2(1/2) = 1 H(a2) = 0.5 * 1 + 0.5 * 1 = 1 So eventually the gain of ‘a2’ will be: Gain(a2) = 1 – 1 = 0 3
  • 4. Regression  |  CPU  performance     Exercise 3: Overfitting Give a simple example of a decision tree and a data set where the decision tree overfits the data. Show explicitly (see the definition of overfitting) that the decision tree in your example overfits. My example for showing overfitted data is in the column and decision tree below. Each and every animal has a unique number (animalnumber). If this training dataset contains a lot of animals we can easily search for the value ‘is_in_cage’ for every animal and eventually know if this animal is within a cage. However if the zoo retrieves new animals we can’t state/predict if this animal should be in a cage. We shouldn’t assume this. Example of a (training) dataset of the zoo: animalnumber number_of_visits is_in_cage animal_1 200 No animal_2 400 Yes animal_3 100 No animal_n .. .. Example it’s decision tree: ( number_of_visits > 140 = yes is_in_cage = yes ) 4
  翻译: