ML - Candidate Elimination Algorithm
Last Updated :
30 Mar, 2023
The candidate elimination algorithm incrementally builds the version space given a hypothesis space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does this by updating the general and specific boundary for each new example.
- You can consider this as an extended form of the Find-S algorithm.
- Consider both positive and negative examples.
- Actually, positive examples are used here as the Find-S algorithm (Basically they are generalizing from the specification).
- While the negative example is specified in the generalizing form.
Terms Used:
- Concept learning: Concept learning is basically the learning task of the machine (Learn by Train data)
- General Hypothesis: Not Specifying features to learn the machine.
- G = {'?', '?','?','?'...}: Number of attributes
- Specific Hypothesis: Specifying features to learn machine (Specific feature)
- S= {'pi','pi','pi'...}: The number of pi depends on a number of attributes.
- Version Space: It is an intermediate of general hypothesis and Specific hypothesis. It not only just writes one hypothesis but a set of all possible hypotheses based on training data-set.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
Example:
Consider the dataset given below:

Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
S = [Null, Null, Null, Null, Null, Null]
For instance 1 : <'sunny','warm','normal','strong','warm ','same'> and positive output.
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.
G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']]
S3 = S2
For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output.
G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]
At last, by synchronizing the G4 and S4 algorithm produce the output.
Output :
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny','warm',?,'strong', ?, ?]
The Candidate Elimination Algorithm (CEA) is an improvement over the Find-S algorithm for classification tasks. While CEA shares some similarities with Find-S, it also has some essential differences that offer advantages and disadvantages. Here are some advantages and disadvantages of CEA in comparison with Find-S:
Advantages of CEA over Find-S:
- Improved accuracy: CEA considers both positive and negative examples to generate the hypothesis, which can result in higher accuracy when dealing with noisy or incomplete data.
- Flexibility: CEA can handle more complex classification tasks, such as those with multiple classes or non-linear decision boundaries.
- More efficient: CEA reduces the number of hypotheses by generating a set of general hypotheses and then eliminating them one by one. This can result in faster processing and improved efficiency.
- Better handling of continuous attributes: CEA can handle continuous attributes by creating boundaries for each attribute, which makes it more suitable for a wider range of datasets.
Disadvantages of CEA in comparison with Find-S:
- More complex: CEA is a more complex algorithm than Find-S, which may make it more difficult for beginners or those without a strong background in machine learning to use and understand.
- Higher memory requirements: CEA requires more memory to store the set of hypotheses and boundaries, which may make it less suitable for memory-constrained environments.
- Slower processing for large datasets: CEA may become slower for larger datasets due to the increased number of hypotheses generated.
- Higher potential for overfitting: The increased complexity of CEA may make it more prone to overfitting on the training data, especially if the dataset is small or has a high degree of noise.
Similar Reads
ML | Expectation-Maximization Algorithm
The Expectation-Maximization (EM) algorithm is an iterative method used in unsupervised machine learning to estimate unknown parameters in statistical models. It helps find the best values for unknown parameters, especially when some data is missing or hidden. It works in two steps: E-step (Expectat
6 min read
ML | ECLAT Algorithm
ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular and efficient technique used for association rule mining. It is an improved alternative to the Apriori algorithm, offering better scalability and computational efficiency. Unlike Apriori, which follows a hori
3 min read
Optimization Algorithms in Machine Learning
Optimization algorithms are the backbone of machine learning models as they enable the modeling process to learn from a given data set. These algorithms are used in order to find the minimum or maximum of an objective function which in machine learning context stands for error or loss. In this artic
15+ min read
Inductive Learning Algorithm
In this article, we will learn about Inductive Learning Algorithm which generally comes under the domain of Machine Learning. What is Inductive Learning Algorithm? Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that is used for generating a set of classif
5 min read
Learn-One-Rule Algorithm
Prerequisite: Rule-Based Classifier Learn-One-Rule: This method is used in the sequential learning algorithm for learning the rules. It returns a single rule that covers at least some examples (as shown in Fig 1). However, what makes it really powerful is its ability to create relations among the at
3 min read
Machine Learning Algorithms Cheat Sheet
Machine Learning Algorithms are a set of rules that help systems learn and make decisions without giving explicit instructions. They analyze data to find patterns and hidden relationships. And using this information, they make predictions on new data and help solve problems. This cheatsheet will cov
4 min read
ML | Find S Algorithm
Introduction : The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm considers only those positive training example. The find-S algorithm sta
4 min read
Sequential Covering Algorithm
Prerequisites: Learn-One-Rule Algorithm Sequential Covering is a popular algorithm based on Rule-Based Classification used for learning a disjunctive set of rules. The basic idea here is to learn one rule, remove the data that it covers, then repeat the same process. In this process, In this way, it
3 min read
Top 6 Machine Learning Classification Algorithms
Are you navigating the complex world of machine learning and looking for the most efficient algorithms for classification tasks? Look no further. Understanding the intricacies of Machine Learning Classification Algorithms is essential for professionals aiming to find effective solutions across diver
13 min read
Introduction to Optimization with Genetic Algorithm
Optimization is the process of finding the best solution after evaluating all possible combinations. When dealing with complex problems, finding the optimal solution becomes crucial. One powerful tool in machine learning for solving such optimization problems is the genetic algorithm. Inspired by th
10 min read