SlideShare a Scribd company logo
DATA MINING
TECHNIQUES
UNIT-III
Association Rule Mining
• All Electronics-customer buys PC & Digital Camera
What should you recommend to him next?
Frequent patterns and association rules are the knowledge that you want to
mine
• Frequent patterns: patterns that appear frequently in a data set
• Frequent item sets: such as milk and bread, that appear frequently in a
transaction data set is frequent item set.
• Frequent sub sequence: appear in subsequence together in transaction data
set
• Frequent substructure: sub graphs, sub trees or sub lattices which may be
combined with item sets or subsequence ,if it occurs frequently is called a
frequent structured pattern
Basic Concepts
• Mining frequent patterns plays an essential role in mining associations,
correlations, data classifications, clustering etc.,
• Market Basket Analysis:
customer1:milk,bread,cereal
customer2:milk,bread,sugar,eggs
customer3:milk,bread,butter
customer4:sugar,eggs
• Which groups or sets of items are customers likely to purchase on a
given trip to a store?
Association Rules
• Support and Confidence are two measures of rule interestingness.
Support: (usefulness of discovered rules)
Certainity:(certainity of discovered rules)
[ support=2%,confidence=60%]
2% of all the transactions under analysis show that computer and
antivirus are purchased together- support
60% of the customers who purchased a computer also bought the
software- confidence
Association Rules
• Association rules are interesting if they satisfy both a minimum
support threshold and a minimum confidence threshold
• Frequent itemset, closed item sets and association rules:
I={I1,I2,..In}-Itemset
D-Task relevant data-database
T-Transaction
Rule: A=>B
Support(A=>B)=P(AUB)-Relative support
Confidence(A=>B)=P(B/A)
Association Rules
• Item sets
• K-Item sets
• Occurrence frequency of an itemset
• Minimum support threshold: If the relative support of an itemset I satisfies a
prespecified minimum support threshold then I is a frequent itemset.
• Confidence(A=>B)=P(B/A)
=support(AUB)
support(A)
=support_count(AUB)
support_count(A)
• Thus the problem of mining association rules can be reduced to that of mining
frequency item sets.
Frequent Item set in Data set (Association Rule
Mining)
• Association Mining searches for frequent items in the data-set. In frequent
mining usually the interesting associations and correlations between item
sets in transactional and relational databases are found. In short, Frequent
Mining shows which items appear together in a transaction or relation.
• Need of Association Mining:
Frequent mining is generation of association rules from a Transactional
Dataset. If there are 2 items X and Y purchased frequently then its good to
put them together in stores or provide some discount offer on one item on
purchase of other item. This can really increase the sales. For example it is
likely to find that if a customer buys Milk and bread he/she also
buys Butter.
So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can
suggest the customer to buy butter if he/she buys Milk and Bread.
Important Definitions :
• Support : It is one of the measure of interestingness. This tells about
usefulness and certainty of rules. 5% Support means total 5% of
transactions in database follow the rule.
• Support(A -> B) = Support_count(A ∪ B)
• Confidence: A confidence of 60% means that 60% of the customers
who purchased a milk and bread also bought butter.
• Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
• If a rule satisfies both minimum support and minimum confidence, it
is a strong rule.
Important Definitions :
• Support_count(X) : Number of transactions in which X appears. If X
is A union B then it is the number of transactions in which A and B
both are present.
1.Maximal Itemset: An itemset is maximal frequent if none of its
supersets are frequent.
2.Closed Itemset: An itemset is closed if none of its immediate
supersets have same support count same as Itemset.
3.K- Itemset: Itemset which contains K items is a K-itemset. So it can
be said that an itemset is frequent if the corresponding support count is
greater than minimum support count.
Example On finding Frequent Itemsets
• Consider the given dataset with given transactions.
• Lets say minimum support count is 3
• Relation hold is maximal frequent => closed => frequent
• 1-frequent:
• {A} = 3; // not closed due to {A, C} and not maximal
• {B} = 4; // not closed due to {B, D} and no maximal
• {C} = 4; // not closed due to {C, D} not maximal
• {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal
• 2-frequent:
• {A, B} = 2 // not frequent because support count < minimum support count so ignore
• {A, C} = 3 // not closed due to {A, C, D}
• {A, D} = 3 // not closed due to {A, C, D}
• {B, C} = 3 // not closed due to {B, C, D}
• {B, D} = 4 // closed but not maximal due to {B, C, D}
• {C, D} = 4 // closed but not maximal due to {B, C, D}
• 3-frequent:
• {A, B, C} = 2 // ignore not frequent because support count < minimum support count
• {A, B, D} = 2 // ignore not frequent because support count < minimum support count
• {A, C, D} = 3 // maximal frequent
• {B, C, D} = 3 // maximal frequent
• 4-frequent:
• {A, B, C, D} = 2 //ignore not frequent
AR as Two step Process
• Find all frequent item sets
• Generate strong association rules from the frequent item sets
• Challenge in mining frequent item sets:
• Closed frequent item set: An itemset X is closed in a data set D if there
exists no proper super-itemset Y such that Y has the same support
count as X in D
• Maximal Frequent item set: An itemset X is a maximal frequent
itemset in a data set D if X is frequent & there exists no super-itemset
Y such that X ʗ Y& Y is frequent in D
Example: closed and maximal frequent
item sets
• A transaction database has only two transactions:
{<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1
• We find two closed frequent item sets and their support counts
C={{a1,a2,..a100}:1;{a1,a2,..a50}:2}
• Only one maximal frequent itemset:
M={{a1,a2,…a100}:1}
• We cannot include {a1,a2,..a50} as a maximal frequent itemset
because it has a frequent superset,{a1,a2,..a100}
• C-closed frequent item set, M-Maximal frequent item sets
Example: closed and maximal frequent
item sets
• Set of closed frequent item sets contain complete information
regarding the frequent item sets
• From c, we can derive
(i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset
{a1,a2,..a50:2}
(ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous
itemset but of the itemset {a1,a2,..a100:1}
Frequent Itemset Mining Methods: Apriori
and FP Growth
• Apriori algorithm:
Finding frequent item sets by confined candidate generation
A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for
mining frequent item sets.
Name of the algorithm is due to the fact that algorithm uses prior
knowledge of frequent itemset properties
Apriori Property: All non empty subsets of a frequent itemset must
also be frequent
Join Step and Prune Step
Example: problem
Problem contd.,
Data mining techniques unit III
Generating Association Rules from
frequent item sets
• Once the frequent item sets from transactions have been found, it is
straightforward to generate strong association rules from them
• Strong association rules satisfy both minimum support and minimum
confidence
• Confidence(A=>B)=P(B/A)
=support_count(AUB)
support_count(A)
Generating Association Rules from
frequent item sets
• Association rules are generated as follows:
For each frequent itemset L, generate all non-empty subsets of L
For every non-empty subset s of L, output the rule
“s=>l-s” if sup_count(l)
sup_count(s) >= min_conf
Example: problem
Improving the efficiency of apriori
• Hash – based Technique: a hash based technique can be used to
reduce the size of the candidate k-item sets, cK ;k >1
• Example :
Improving the efficiency of apriori
• Transaction Reduction: reducing the no. of transaction scanned in
future iterations.
• A transaction that does not contain any frequent k-item sets cannot
contain any frequent (k+1) item sets.
• Such a transaction can be marked or removed from further
consideration.
Improving the efficiency of apriori
• Partitioning:2db scans
Partitioning the data to find candidate itemsets requires 2 db scans to
mine the frequent itemsets
• Phase I:
Divide the transaction of D into ‘n’ non overlapping partitions
Find the local frequent itemsets for each partition
Any itemset that is frequent in D must occur as a frequentitemset in
atleast one of the partitions
Therefore all local frequent itemsets are candidate itemsets in D
Improving the efficiency of apriori
• Phase: II
A second scan of D is conducted to determine the global frequent
item set, D is scanned only once in each phase
• Sampling
• Dynamic itemset counting
A database has five transactions. Let min sup D
60% and min conf D 80%.
Data mining techniques unit III
A pattern-growth approach for mining
frequent item sets
• Apriori algorithm: Disadvantages
• Generate and test method-reduces the size of candidate sets that leads
to good performance gain
• Suffers from nontrivial costs
Frequent pattern growth or FP growth
(Divide and Conquer)
• Mines the complete set of frequent item sets without such a costly
candidate generation
• First it compresses the database representing frequent items into FP-
tree,which retains the itemset association information
• Create the root of the tree labelled with “null”
• Scan D second time
• Items in each transaction are processed in ”L” order and branch is
created for each transactions
Mining the FP-tree
• Start from each frequent length_1 pattern (as an initial suffix pattern)
construct its conditional pattern base
• Then constructs its conditional FP tree and perform mining recursively
on the tree
• Pattern growth is achieved by the concatenation of suffix pattern with
the frequent patterns generated from a conditional FP-tree
• This method reduces the search cost.
• Algorithm-FP growth
Data mining techniques unit III
Data mining techniques unit III
Data mining techniques unit III
Mining frequent item sets using the
vertical data format
Mining closed and maximum patterns
• How can we mine closed frequent item sets?
• Strategies included:
Item merging
Sub-itemset pruning
Item skipping
• When a new frequent itemset is derived it is necessary to perform two
kinds of closure checking:
Superset checking
Subset checking
Pattern Evaluation Methods
• Strong rules are not necessarily interesting:
Pattern Evaluation Methods
• From association analysis to correlation analysis:
• Correlation rule:
• Correlation measure:
Pattern Evaluation Methods: chi-square
measure
Comparison of pattern evaluation
measures
• All-confidence
• Max_confidence
• Kulczynski(kulc)
• Cosine
• Null Transactions
• Null Invariant
Data mining techniques unit III
Advanced pattern mining
• What is pattern mining?
• Pattern mining: A Road map
Basic patterns: frequent pattern, closed pattern, max-pattern,
infrequent pattern or rare patterns, negative patterns
Based on the abstraction levels involved in a pattern: single-level
association rule, multilevel association rules
Pattern mining: A Road map
Based on the number of dimensions involved in the rule or pattern :
Single-dimensional association rule/pattern , Multidimensional
association rule/pattern
Pattern mining: A Road map
• Based on the types of values handled in the rule or pattern: Boolean
association rule, quantitative association rule
Pattern mining: A Road map
• Based on the constraints or criteria used to mine selective
patterns:constraint-based,approximate,compressed,near-match,top-
k,redundancy-aware top-k
• Based on kinds of data and features to be mined: sequential patterns,
structural patterns
• Based on application domain-specific semantics
• Based on data analysis usages: pattern based classification, pattern
based clustering
Data mining techniques unit III
Pattern mining in multilevel,
multidimensional space
• Mining multilevel associations
Pattern mining in multilevel,
multidimensional space
• Using uniform minimum support for all levels
• Using reduced minimum support at lower levels
Pattern mining in multilevel,
multidimensional space
• Using item or group-based minimum support
Pattern mining in multilevel,
multidimensional space
• Mining Multidimensional Associations
Single dimensional or intradimensional association rules
Multi dimensional or interdimensional association rules
Pattern mining in multilevel,
multidimensional space
• Mining quantitative association rules
A data cube method
A clustering-based method
A statistical analysis method to uncover exceptional behaviours
Data mining techniques unit III
Pattern mining in multilevel,
multidimensional space
• Mining rare patterns and negative patterns
Constraint-based frequent pattern mining
• It includes the following: Knowledge type constraints, data
constraints, dimension/level constraints, Interestingness constraints,
Rule constraints
• Meta-rule guided mining of association rule
• Constraint based pattern generation
• An efficient frequent pattern mining processor can prune its search
space during mining in two ways:
Pruning pattern search space
Pruning data search space
Constraint-based frequent pattern mining
• There are five categories of pattern mining constraints:
Antimonotonic
Monotonic
Succint
Convertible
In convertible
Constraint-based frequent pattern mining
• Pruning data space with data pruning constraints
Data succinctness
Data antimonotocity
Data mining techniques unit III
Ad

More Related Content

What's hot (20)

Purpose of DBMS and users of DBMS
Purpose of DBMS and users of DBMSPurpose of DBMS and users of DBMS
Purpose of DBMS and users of DBMS
DharmamSavani
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
Megha yadav
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
Tech_MX
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
PriyadharshiniG41
 
The Object Model
The Object Model  The Object Model
The Object Model
yndaravind
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
Krish_ver2
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
GovardhanV7
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
Krish_ver2
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Data reduction
Data reductionData reduction
Data reduction
kalavathisugan
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
SABITHARASSISTANTPRO
 
Procedural vs. object oriented programming
Procedural vs. object oriented programmingProcedural vs. object oriented programming
Procedural vs. object oriented programming
Haris Bin Zahid
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
Damian T. Gordon
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
Kamal Acharya
 
Join query
Join queryJoin query
Join query
Waqar Ali
 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer Vision
Silicon Mentor
 
Overview of the graphics system
Overview of the graphics systemOverview of the graphics system
Overview of the graphics system
Kamal Acharya
 
Purpose of DBMS and users of DBMS
Purpose of DBMS and users of DBMSPurpose of DBMS and users of DBMS
Purpose of DBMS and users of DBMS
DharmamSavani
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
Megha yadav
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
Tech_MX
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
PriyadharshiniG41
 
The Object Model
The Object Model  The Object Model
The Object Model
yndaravind
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
Krish_ver2
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
GovardhanV7
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
Krish_ver2
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Procedural vs. object oriented programming
Procedural vs. object oriented programmingProcedural vs. object oriented programming
Procedural vs. object oriented programming
Haris Bin Zahid
 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer Vision
Silicon Mentor
 
Overview of the graphics system
Overview of the graphics systemOverview of the graphics system
Overview of the graphics system
Kamal Acharya
 

Similar to Data mining techniques unit III (20)

Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
ssuser957b41
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
Rashmi Bhat
 
MIning association rules and frequent patterns.pptx
MIning association rules and frequent patterns.pptxMIning association rules and frequent patterns.pptx
MIning association rules and frequent patterns.pptx
gebremichael0777
 
6 module 4
6 module 46 module 4
6 module 4
tafosepsdfasg
 
Module2_Part 2_Apriori and FP Growth.pptx
Module2_Part 2_Apriori and FP Growth.pptxModule2_Part 2_Apriori and FP Growth.pptx
Module2_Part 2_Apriori and FP Growth.pptx
tivoy24550
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
Dr. Jasmine Beulah Gnanadurai
 
Association and Correlation analysis.....
Association and Correlation analysis.....Association and Correlation analysis.....
Association and Correlation analysis.....
anjanasharma77573
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
thamizh arasi
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
rahulmath80
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
selvifitria1
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
AmenahAbbood
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
prema370155
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
International School of Engineering
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
Er. Arpit Sharma
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
NBACriteria2SICET
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Utkarsh Sharma
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
ssuser957b41
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
Rashmi Bhat
 
MIning association rules and frequent patterns.pptx
MIning association rules and frequent patterns.pptxMIning association rules and frequent patterns.pptx
MIning association rules and frequent patterns.pptx
gebremichael0777
 
Module2_Part 2_Apriori and FP Growth.pptx
Module2_Part 2_Apriori and FP Growth.pptxModule2_Part 2_Apriori and FP Growth.pptx
Module2_Part 2_Apriori and FP Growth.pptx
tivoy24550
 
Association and Correlation analysis.....
Association and Correlation analysis.....Association and Correlation analysis.....
Association and Correlation analysis.....
anjanasharma77573
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
rahulmath80
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
AmenahAbbood
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
prema370155
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
Er. Arpit Sharma
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
NBACriteria2SICET
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Utkarsh Sharma
 
Ad

More from malathieswaran29 (13)

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
malathieswaran29
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
malathieswaran29
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Bitcoin data mining
Bitcoin data miningBitcoin data mining
Bitcoin data mining
malathieswaran29
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
malathieswaran29
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
malathieswaran29
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
malathieswaran29
 
SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4
malathieswaran29
 
SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3
malathieswaran29
 
SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2
malathieswaran29
 
SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1
malathieswaran29
 
SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5
malathieswaran29
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
malathieswaran29
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
malathieswaran29
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
malathieswaran29
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
malathieswaran29
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
malathieswaran29
 
Ad

Recently uploaded (20)

How to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink DisplayHow to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
CircuitDigest
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic AlgorithmDesign Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Journal of Soft Computing in Civil Engineering
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Working with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to ImplementationWorking with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to Implementation
Alabama Transportation Assistance Program
 
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Journal of Soft Computing in Civil Engineering
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation RateModeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Journal of Soft Computing in Civil Engineering
 
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Journal of Soft Computing in Civil Engineering
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink DisplayHow to Build a Desktop Weather Station Using ESP32 and E-ink Display
How to Build a Desktop Weather Station Using ESP32 and E-ink Display
CircuitDigest
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Lecture - 7 Canals of the topic of the civil engineering
Lecture - 7  Canals of the topic of the civil engineeringLecture - 7  Canals of the topic of the civil engineering
Lecture - 7 Canals of the topic of the civil engineering
MJawadkhan1
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Artificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptxArtificial intelligence and machine learning.pptx
Artificial intelligence and machine learning.pptx
rakshanatarajan005
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 

Data mining techniques unit III

  • 2. Association Rule Mining • All Electronics-customer buys PC & Digital Camera What should you recommend to him next? Frequent patterns and association rules are the knowledge that you want to mine • Frequent patterns: patterns that appear frequently in a data set • Frequent item sets: such as milk and bread, that appear frequently in a transaction data set is frequent item set. • Frequent sub sequence: appear in subsequence together in transaction data set • Frequent substructure: sub graphs, sub trees or sub lattices which may be combined with item sets or subsequence ,if it occurs frequently is called a frequent structured pattern
  • 3. Basic Concepts • Mining frequent patterns plays an essential role in mining associations, correlations, data classifications, clustering etc., • Market Basket Analysis: customer1:milk,bread,cereal customer2:milk,bread,sugar,eggs customer3:milk,bread,butter customer4:sugar,eggs • Which groups or sets of items are customers likely to purchase on a given trip to a store?
  • 4. Association Rules • Support and Confidence are two measures of rule interestingness. Support: (usefulness of discovered rules) Certainity:(certainity of discovered rules) [ support=2%,confidence=60%] 2% of all the transactions under analysis show that computer and antivirus are purchased together- support 60% of the customers who purchased a computer also bought the software- confidence
  • 5. Association Rules • Association rules are interesting if they satisfy both a minimum support threshold and a minimum confidence threshold • Frequent itemset, closed item sets and association rules: I={I1,I2,..In}-Itemset D-Task relevant data-database T-Transaction Rule: A=>B Support(A=>B)=P(AUB)-Relative support Confidence(A=>B)=P(B/A)
  • 6. Association Rules • Item sets • K-Item sets • Occurrence frequency of an itemset • Minimum support threshold: If the relative support of an itemset I satisfies a prespecified minimum support threshold then I is a frequent itemset. • Confidence(A=>B)=P(B/A) =support(AUB) support(A) =support_count(AUB) support_count(A) • Thus the problem of mining association rules can be reduced to that of mining frequency item sets.
  • 7. Frequent Item set in Data set (Association Rule Mining) • Association Mining searches for frequent items in the data-set. In frequent mining usually the interesting associations and correlations between item sets in transactional and relational databases are found. In short, Frequent Mining shows which items appear together in a transaction or relation. • Need of Association Mining: Frequent mining is generation of association rules from a Transactional Dataset. If there are 2 items X and Y purchased frequently then its good to put them together in stores or provide some discount offer on one item on purchase of other item. This can really increase the sales. For example it is likely to find that if a customer buys Milk and bread he/she also buys Butter. So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can suggest the customer to buy butter if he/she buys Milk and Bread.
  • 8. Important Definitions : • Support : It is one of the measure of interestingness. This tells about usefulness and certainty of rules. 5% Support means total 5% of transactions in database follow the rule. • Support(A -> B) = Support_count(A ∪ B) • Confidence: A confidence of 60% means that 60% of the customers who purchased a milk and bread also bought butter. • Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A) • If a rule satisfies both minimum support and minimum confidence, it is a strong rule.
  • 9. Important Definitions : • Support_count(X) : Number of transactions in which X appears. If X is A union B then it is the number of transactions in which A and B both are present. 1.Maximal Itemset: An itemset is maximal frequent if none of its supersets are frequent. 2.Closed Itemset: An itemset is closed if none of its immediate supersets have same support count same as Itemset. 3.K- Itemset: Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if the corresponding support count is greater than minimum support count.
  • 10. Example On finding Frequent Itemsets • Consider the given dataset with given transactions. • Lets say minimum support count is 3 • Relation hold is maximal frequent => closed => frequent
  • 11. • 1-frequent: • {A} = 3; // not closed due to {A, C} and not maximal • {B} = 4; // not closed due to {B, D} and no maximal • {C} = 4; // not closed due to {C, D} not maximal • {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal • 2-frequent: • {A, B} = 2 // not frequent because support count < minimum support count so ignore • {A, C} = 3 // not closed due to {A, C, D} • {A, D} = 3 // not closed due to {A, C, D} • {B, C} = 3 // not closed due to {B, C, D} • {B, D} = 4 // closed but not maximal due to {B, C, D} • {C, D} = 4 // closed but not maximal due to {B, C, D} • 3-frequent: • {A, B, C} = 2 // ignore not frequent because support count < minimum support count • {A, B, D} = 2 // ignore not frequent because support count < minimum support count • {A, C, D} = 3 // maximal frequent • {B, C, D} = 3 // maximal frequent • 4-frequent: • {A, B, C, D} = 2 //ignore not frequent
  • 12. AR as Two step Process • Find all frequent item sets • Generate strong association rules from the frequent item sets • Challenge in mining frequent item sets: • Closed frequent item set: An itemset X is closed in a data set D if there exists no proper super-itemset Y such that Y has the same support count as X in D • Maximal Frequent item set: An itemset X is a maximal frequent itemset in a data set D if X is frequent & there exists no super-itemset Y such that X ʗ Y& Y is frequent in D
  • 13. Example: closed and maximal frequent item sets • A transaction database has only two transactions: {<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1 • We find two closed frequent item sets and their support counts C={{a1,a2,..a100}:1;{a1,a2,..a50}:2} • Only one maximal frequent itemset: M={{a1,a2,…a100}:1} • We cannot include {a1,a2,..a50} as a maximal frequent itemset because it has a frequent superset,{a1,a2,..a100} • C-closed frequent item set, M-Maximal frequent item sets
  • 14. Example: closed and maximal frequent item sets • Set of closed frequent item sets contain complete information regarding the frequent item sets • From c, we can derive (i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset {a1,a2,..a50:2} (ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous itemset but of the itemset {a1,a2,..a100:1}
  • 15. Frequent Itemset Mining Methods: Apriori and FP Growth • Apriori algorithm: Finding frequent item sets by confined candidate generation A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for mining frequent item sets. Name of the algorithm is due to the fact that algorithm uses prior knowledge of frequent itemset properties Apriori Property: All non empty subsets of a frequent itemset must also be frequent Join Step and Prune Step
  • 19. Generating Association Rules from frequent item sets • Once the frequent item sets from transactions have been found, it is straightforward to generate strong association rules from them • Strong association rules satisfy both minimum support and minimum confidence • Confidence(A=>B)=P(B/A) =support_count(AUB) support_count(A)
  • 20. Generating Association Rules from frequent item sets • Association rules are generated as follows: For each frequent itemset L, generate all non-empty subsets of L For every non-empty subset s of L, output the rule “s=>l-s” if sup_count(l) sup_count(s) >= min_conf
  • 22. Improving the efficiency of apriori • Hash – based Technique: a hash based technique can be used to reduce the size of the candidate k-item sets, cK ;k >1 • Example :
  • 23. Improving the efficiency of apriori • Transaction Reduction: reducing the no. of transaction scanned in future iterations. • A transaction that does not contain any frequent k-item sets cannot contain any frequent (k+1) item sets. • Such a transaction can be marked or removed from further consideration.
  • 24. Improving the efficiency of apriori • Partitioning:2db scans Partitioning the data to find candidate itemsets requires 2 db scans to mine the frequent itemsets • Phase I: Divide the transaction of D into ‘n’ non overlapping partitions Find the local frequent itemsets for each partition Any itemset that is frequent in D must occur as a frequentitemset in atleast one of the partitions Therefore all local frequent itemsets are candidate itemsets in D
  • 25. Improving the efficiency of apriori • Phase: II A second scan of D is conducted to determine the global frequent item set, D is scanned only once in each phase • Sampling • Dynamic itemset counting
  • 26. A database has five transactions. Let min sup D 60% and min conf D 80%.
  • 28. A pattern-growth approach for mining frequent item sets • Apriori algorithm: Disadvantages • Generate and test method-reduces the size of candidate sets that leads to good performance gain • Suffers from nontrivial costs
  • 29. Frequent pattern growth or FP growth (Divide and Conquer) • Mines the complete set of frequent item sets without such a costly candidate generation • First it compresses the database representing frequent items into FP- tree,which retains the itemset association information • Create the root of the tree labelled with “null” • Scan D second time • Items in each transaction are processed in ”L” order and branch is created for each transactions
  • 30. Mining the FP-tree • Start from each frequent length_1 pattern (as an initial suffix pattern) construct its conditional pattern base • Then constructs its conditional FP tree and perform mining recursively on the tree • Pattern growth is achieved by the concatenation of suffix pattern with the frequent patterns generated from a conditional FP-tree • This method reduces the search cost. • Algorithm-FP growth
  • 34. Mining frequent item sets using the vertical data format
  • 35. Mining closed and maximum patterns • How can we mine closed frequent item sets? • Strategies included: Item merging Sub-itemset pruning Item skipping • When a new frequent itemset is derived it is necessary to perform two kinds of closure checking: Superset checking Subset checking
  • 36. Pattern Evaluation Methods • Strong rules are not necessarily interesting:
  • 37. Pattern Evaluation Methods • From association analysis to correlation analysis: • Correlation rule: • Correlation measure:
  • 38. Pattern Evaluation Methods: chi-square measure
  • 39. Comparison of pattern evaluation measures • All-confidence • Max_confidence • Kulczynski(kulc) • Cosine • Null Transactions • Null Invariant
  • 41. Advanced pattern mining • What is pattern mining? • Pattern mining: A Road map Basic patterns: frequent pattern, closed pattern, max-pattern, infrequent pattern or rare patterns, negative patterns Based on the abstraction levels involved in a pattern: single-level association rule, multilevel association rules
  • 42. Pattern mining: A Road map Based on the number of dimensions involved in the rule or pattern : Single-dimensional association rule/pattern , Multidimensional association rule/pattern
  • 43. Pattern mining: A Road map • Based on the types of values handled in the rule or pattern: Boolean association rule, quantitative association rule
  • 44. Pattern mining: A Road map • Based on the constraints or criteria used to mine selective patterns:constraint-based,approximate,compressed,near-match,top- k,redundancy-aware top-k • Based on kinds of data and features to be mined: sequential patterns, structural patterns • Based on application domain-specific semantics • Based on data analysis usages: pattern based classification, pattern based clustering
  • 46. Pattern mining in multilevel, multidimensional space • Mining multilevel associations
  • 47. Pattern mining in multilevel, multidimensional space • Using uniform minimum support for all levels • Using reduced minimum support at lower levels
  • 48. Pattern mining in multilevel, multidimensional space • Using item or group-based minimum support
  • 49. Pattern mining in multilevel, multidimensional space • Mining Multidimensional Associations Single dimensional or intradimensional association rules Multi dimensional or interdimensional association rules
  • 50. Pattern mining in multilevel, multidimensional space • Mining quantitative association rules A data cube method A clustering-based method A statistical analysis method to uncover exceptional behaviours
  • 52. Pattern mining in multilevel, multidimensional space • Mining rare patterns and negative patterns
  • 53. Constraint-based frequent pattern mining • It includes the following: Knowledge type constraints, data constraints, dimension/level constraints, Interestingness constraints, Rule constraints • Meta-rule guided mining of association rule • Constraint based pattern generation • An efficient frequent pattern mining processor can prune its search space during mining in two ways: Pruning pattern search space Pruning data search space
  • 54. Constraint-based frequent pattern mining • There are five categories of pattern mining constraints: Antimonotonic Monotonic Succint Convertible In convertible
  • 55. Constraint-based frequent pattern mining • Pruning data space with data pruning constraints Data succinctness Data antimonotocity
  翻译: