SlideShare a Scribd company logo
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
Vol. 5, No. 3, September 2017, pp. 275~284
ISSN: 2089-3272, DOI: 10.11591/ijeei.v5i3.299  275
Received March 25, 2017; Revised July 10, 2017; Accepted August 11, 2017
An Heterogeneous Population-Based Genetic
Algorithm for Data Clustering
Amina Bedboudi
*1
, Cherif Bouras
2
, Mohamed Tahar Kimour
3
1,3
Departement of Computer Science, University of Badji Mokhtar - Annaba, Algeria
2
Departement of Mathematics, University of Badji Mokhtar - Annaba, Algeria
e-mail: bedboudi.amina@hotmail.fr
Abstract
As a primary data mining method for knowledge discovery, clustering is a technique of
classifying a dataset into groups of similar objects. The most popular method for data clustering
K-means suffers from the drawbacks of requiring the number of clusters and their initial centers,
which should be provided by the user. In the literature, several methods have proposed in a
form of k-means variants, genetic algorithms, or combinations between them for calculating the
number of clusters and finding proper clusters centers. However, none of these solutions has
provided satisfactory results and determining the number of clusters and the initial centers are
still the main challenge in clustering processes. In this paper we present an approach to
automatically generate such parameters to achieve optimal clusters using a modified genetic
algorithm operating on varied individual structures and using a new crossover operator.
Experimental results show that our modified genetic algorithm is a better efficient alternative to
the existing approaches
Keywords: data mining, clustering, K-means, genetic algorithms, crossover
1. Introduction
As one of the most important tasks of spatial data mining, cluster analysis has been
widely used in several domains, such as biology, system engineering and social sciences, in
order to identify natural groups in large amounts of data [1-3]. A clustering algorithm assigns a
large number of data points to a smaller number of groups (or clusters) such that data points in
the same group share the same properties (similar) while, in different groups, they are dissimilar
[4]. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics,
and numerical analysis [5]. Data mining adds to clustering the complications of very large
datasets with very many attributes of different types. This imposes unique computational
requirements on relevant clustering algorithms.
A variety of algorithms have recently emerged that meet these requirements and were
successfully applied to real-life data mining problems. Especially, popular methods such as K-
means, genetic algorithms and their variants are the most notable ones for data clustering [1-9],
[14-21]. Their advantages lie in the simplicity, efficiency and ease convergenc, however, most of
them suffer from the drawbacks of requiring the number of clusters and their initial centers, but
also from the poor clustering quality [9].
In this paper we present an approach to automatically generate such parameters, while
improving the speed and accuracy of the algorithm. It is based on an appropriate data structure
and process that handles heterogeneous populations in a modified genetic algorithm that
operates on varied individual structures .Generally, using parameters such as the crossover and
mutation probabilities that adapt to the evolution of the algorithm is a good choice, since higher
diversity in the population can be achieved, preventing the algorithm to fall in local minima.
Moreover, to accelerate the genetic algorithm process and increase the individual
diversity of the initial population, we generate that initial population in two manners: the first
subpopulation is obtained with a deterministic way and the second one with a random way.
Increasing the population diversity will allow to achieve better quality.
The rest of the article is structured as follows: section 2 gives an analysis of two data
clustering techniques; k-means and genetic algorithms. Section 3 details our data clustering
 ISSN: 2089-3272
IJEEI Vol. 5, No. 3, September 2017 : 275 – 284
276
approach by describing the proposed structure of chromosomes and genetic operators. Section
4 presents the related works and a comparaison with our aproposed approach. Section 5
describes the experimental results and discusses the strength of our approach. Finally, we
draw a conclusion and define some future works in section 6.
2. Clustering
Clustering can be thought of as data partitioning or segmenting into homogeneous
groups. The clustering is usually accomplished by determining the similarity among the data on
predefined attributes. The most similar data are grouped into clusters [2-5], [9].
There is a close relationship between clustering techniques and many other disciplines.
Clustering has always been used in statistics and science [7]. Typical applications include
speech and character recognition. viewed as a density estimation problem, machine learning
clustering algorithms were applied to image segmentation and computer vision [10-13]. In the
literature, various methods have been used to handle the clustering problem. The K-means [4-5]
[17] is considered one of the major algorithms widely used in clustering [1]. Genetic algorithms
[6] are also used in clustering, either in a separated way or combined with-the k-means
algorithm [9].
2.1. K-means
K-means first randomly select the k objects; each object initially represents a cluster
center. For each remaining object according to its distance from the center of each cluster,
assign it to the nearest cluster. Afterward, each cluster center is replaced by the average value
on the respective cluster. This process is repeated, until k centers do not change. K-means
algorithm attempts to provide a partition of the given data such that the centroids of provided
classes are the minima of the following objective function [8-9], [13], [20] :
∑ ∑ ‖ ‖
The typical algorithm is as follows [1]
1. Select k objects as initial centers;
2. Assign each data object to the closest center;
3. Recalculate the centers of each cluster;
4. Repeat steps 2 and 3 until centers do not change;
Due to its practice properties, K-means algorithm is considered as the most popular
technique to cluster information. Its main advantages are: i) easy to implement, ii) the
comparison is conducted only between the observations and the center of classes, and iii) it
detects and isolates the outliers.
However, the algorithm exhibits some drawbacks, especially on the fact that it can be
really slow since in each step the distance between each point to each cluster has to be
calculated, which can be really expensive in the presence of a large dataset. Moreover, it is
sensitive to the initial cluster centers that are provided by the user as well as the k number of
clusters.
IJEEI ISSN: 2089-3272 
An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi)
277
2.2. Genetic algorithms
Genetic algorithms [6] are robust, stochastic optimization algorithms, used to solve a
wide variety of problems. They were developed with the goal of better understanding natural
processes such as adaptation, and it belongs to a type of search techniques that mimic the
principles of natural selection to develop solutions of large optimization problems. A genetic
algorithm finds the optimal value for a particular objective function depending on the problem to
be solved. The standard approach to an optimization problem begins by designing an objective
function that can model the problem’s objectives while incorporating any constraints.
A genetic algorithm consists of: Chromosomal Representation, initial population, fitness
evaluation, selection and reproduction (i.e., crossover and mutation). GA operates by
maintaining and manipulating a population of potential solutions called chromosomes. Each
chromosome has an associated fitness value which is a qualitative measure of the goodness of
the solution encoded in it.
In a genetic algorithm process, we determine firstly an initial solution, composed of a set
of chromosomes (initial population), and iteratively apply reproduction operators(selection,
crossover and mutation) until achieving a certain quality parameter or a predefined number of
iterations. A use of a fitness function guides the stochastic selection of chromosomes which are
then used to generate new candidate solutions through crossover and mutation.
Therefore, basic operations are: i) Crossover, which generates new chromosomes by
combining sections of two or more selected parents, ii) mutation, which acts by randomly
selecting genes which are then altered; thereby preventing suboptimal solutions from persisting
and increases diversity in the population.
There are three main types of selection methods: fitness proportionate selection, ranking
method and tournament selection.
In tournament selection [14], individuals are selected randomly from the population, based on
the fitness function. Tournaments are often held between pairs of individuals, although larger
tournaments can be used. Simple outline of a genetic algorithm is shown in Figure 1.
Figure 1. Outline of the genetic algorithm.
 ISSN: 2089-3272
IJEEI Vol. 5, No. 3, September 2017 : 275 – 284
278
Genetic algorithms are characterized by attributes such as objective function, encoding
of the input data, crossover, mutation, and population size.
(1) Objective function. It is used to assign each individual in the population a fitness
value; an individual with a higher fitness represents a better solution to the problem
than an individual with a lower fitness value;
(2) Encoding. Genetic algorithms operate on an en-coding of the problem’s input data
(which represent in-dependent variables for the objective function);
(3) Elitism. This is a way to ensure that the highly fit-ting chromosomes are not lost and
copied to the new population. Elitism has been found to be very important to the
performance of genetic algorithms;
(4) Crossover. It is a procedure in which a highly fit-ting chromosome is given an
opportunity to reproduce by exchanging pieces of its genetic information with other
highly fitting chromosomes;
(5) Mutation. This is often applied after crossover by randomly altering some genes to
individual parents;
(6) Population size. It is the number of individuals in a population. The larger the
population size, the better the chance that an optimal solution will be found.
Genetic algorithms iterate a fixed number of times. Since the function’s upper bound
(the maximum fitness value possible for an individual) may not be known or cannot be reached,
we must limit the number of generations in order to guarantee the termination of the search
process. This may result in a suboptimal solution. Moreover, combined approaches of genetic
algorithms with other clustering techniques are still not satisfying on some user requirements
especially for the accuracy and execution time
3. The Proposed Approach
The aim of our approach is to give an efficient clustering process leading to better
enhanced clusters’ quality and execution time using a varied structure-based genetic algorithm
with a new crossover operator. Tthe input of such modified genetic algorithm process is a
dataset. The genetic process automatically generates the number of clusters with their initial
centroids
3.1. Chromosome representation
Chromosome representation is a problem encoding performed at the most important
steps in using genetic algorithm to solve a problem. To encode both the number of clusters and
their centers, we propose a chromosome structure that applies a real coded genetic algorithm to
the clustering problem, where crossover and mutation operators are applied directly to real
parameter values.
The use of real parameter values in the GA representation has a number of advantages
over binary coding. The efficiency of the GA is increased as there is no need to convert the
solution variables to the binary type, less memory is required, there is no loss in precision by
discretization to binary or other values, and there is greater freedom to use different genetic
operators. In this work, the existing population is processed in the form of several
subpopulations with different size. In other words, the number of genes in every cluster differs.
A chromosome in a subpopulations may contain 2 genes, 3 genes to Kmax genes, where
Kmaxis the maximum number of clusters. These genes hold corresponding information about the
cluster centers. Here, each gene in the chromosome represents cluster center, and number K of
genes in a chromosome stands for number of clusters. The value of K is assumed to lie in the
interval [2; Kmax]. Thus, each solution i is a fixed-length string represented by cluster centers cij ;
j = 2,…,Kmax. Then, solution i of the population is represented via a vector as follows:
IJEEI ISSN: 2089-3272 
An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi)
279
ci1
… cij … ciKmax
Figure.2. An example of chromosomes as defined by our approach.
3.2. Fitness function
The fitness function f is a measure of profit we need during optimization. It is an
objective function that is used to summarize how close a given solution is to achieving the aims
and has an important effect on success of a genetic algorithm. Fitness is proportional to the
utility or ability of individual which the chromosome represents. Measure of fitness helps in
evolving good solutions and implementing natural selection. In this work, the fitness of a
chromosome is computed using Mean Square Error (MSE) [18]. The MSE calculate the error
between the clusters.
MSE= ∑ (1)
Fitness =
After the definitions of the fitness function the different parameters of the genetic
algorithm operators are fixed.
3.3. Generation of the Initial Population
To accelerate the genetic algorithm process and increase the individual diversity of the
initial population, we generate that initial population in two phases. The first phase consists in a
deterministic producing manner, and the second one consists in a randomly generation. We
proceed such generation so to obtain 20% of the initial population from the first phase and 80%
from the second phase. his phase, we deterministically generate the first sub-population that
constitutes 20% the entire population. In From the input dataset, we produce a sorted dataset.
Then, the sorted dataset is divided into k equal segments and the means, modal, and min-max
values (see Algorithm1) of each segment are determined and considered asinitial cluster
centers. In doing so, we increase the diversity of individuals leading to enhanced genetic
reproduction.
Algorithm1: generate20%population
 Sort the datset;
 For (k=2, k<=Kmax-1, k++){
- Divide the sorted data int k equal segments: {Sk};
- Modal step: take the modal value of each segment k as a center Ck;
- Means step: take the means value of each segment k as a center Ck;
- Min-Max step:
For (i=0, i<k, k++){
Calculate c = (max-min)/2
Consider c as a ith center
3.4. Reproduction
Genetic Algorithms aids to look for the best solution among a number of possible
solutions throw reproduction. Reproduction can be implemented in an algorithmic form, based
on reproduction operators, suited to the encoding scheme.The objective of the reproduction
operators is to ensure diversity in the population, such that the fittest solutions can be derived
through the evolutionary process. Such evolution process is composed of crossover and
mutation operations.
 ISSN: 2089-3272
IJEEI Vol. 5, No. 3, September 2017 : 275 – 284
280
During such process, invalid chromosomes may be produced. In the proposed
chromosome representation, repetition of genes produces invalid chromosomes because a data
point cannot be center point of more than one cluster. So to be able to detect production of
invalid chromosomes, we sort the chromosome genes in ascending order. In doing so, we aim
to achieve small length, fast detection of invalid chromosomes, and faster crossover and
mutation operations. Figure 3. illustrates our crossover process using a mask.
Figure 3. The crossover process using a mask
3.4.1. Crossover
It is a reproduction process that the children chromosomes are generated according to
the fitness function values of their parents. We define a crossover operator in such a way that it
could accept parent chromosomes with different number of genes. It occurs with probability
crossp. For each two selected parents, crossover is applied, producing two offsprings. To this
end, a binary mask is created. Its length is equal to that of the shorter parent. In this mask, digit
“1” occurs with a certain probability p.
C
(1)
j=C
(1)
j+α C
(2)
j (2)
C
(2)
j=C
(2)
j+(1-α) C
(1)
j (3)
Figure 4. Depicts the crossover formulas with α as a uniformly distributed random
number such that
α ∈[-1, 1]. Offspring O1 is created by applying the formula (1) and offspring O2 is created
by applying the formula (2). These formulas in Figure 4 are applied on genes in the parents
corresponding to the positions in the mask with a digit “1”. Also, we should check the validity of
the resulting chromosomes to avoid redundant centers in one chromosome. In the case of
redundancy, we replace the redundant gene with a non redundant one, extracted from a parent.
IJEEI ISSN: 2089-3272 
An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi)
281
Algorithm2: Clustering with GA
Input: data points, kmax
Output: clusters
Procedure:
(1) Use algorithm 1 to generate 20% of the initial population;
(2) Randomly generate the remaining 80% of initial populations;
(3) Calculate the fitness values of the chromosomes in the initial population;
(4) Select the two fittest chromosomes (P1, P2) and randomly choose a target one P3 from
the population;
(5) Apply the crossover and mutation operators on (P1, P2) to generate two offsprings (O1,
O2);
(6) Calculate the fitness values of the offsprings
(7) Replace P1 and P2 with O1 or O2 , if they are not better than such offsprings;
(8) Repeat from step (3) if there is a significant change between generations, or the
number of iteration has not been reached;
(9) Final clusters are derived from the fittest chromosome using the one step K-means
algorithm [21].
Figure 5. The main steps of the proposed genetic algorithm to clustering.
3.4.2. Mutation
Mutation is intended to prevent falling of all solutions in the population into a local
optimum of the solved problem or enhancing the obtained chromosome. Mutation takes place
with a lower probability than that of the crossover. We define two types of mutation to be taken
place after the crossover. Topological mutation is intended to add or delete genes from an
offspring. Gene mutation is intended to modify a selected gene from an offspring. Gene
mutation is applied by randomly selecting a gene from an offspring and replacing it with a data
point randomly extract from a dataset. Afterward, a validity check is performed to avoid
redundant centers and to keep the cluster number in [2, Kmax].
Our overall genetic process organizes operations as depicted by Figure 5. To find the closest
cluster to every point, we apply a k-means operator (KMO), which is a one step of the classical
k-means algorithm. In other words, we assign data points to their clusters for each new
chromosome using the KMO operator, in order to compute its fitness.
4. Related works
In the k-means algorithm, the most important challenge is related to determining of the
number of clusters and their initial centers. Randomly determining initial centers does not
guarantee good quality of clustering, due to the fact that at each run of K-Means algorithm on
the same dataset may result in a different clusyers. Besides, the process may converge to
suboptimal partitions.
In [3], the author proposed to sort the dateset according to their distances from each
other, and then divide the sorted data into k equal segments. He takes the first data item of
each segment as a corresponding cluster center. In [12], the authors proposed a solution based
on a statistical test, with the assymption that a subset of data follows a Gaussian distribution.
The proposed process begins with a small number k of clusters, and then increments k at each
step based on appropriate condition.
In [17-21] genetic algorithm-based solutions are proposed. They maintain a population
of some coded solutions and apply a one step k-means operator to calculate the fitness of a
chromosome. In [13], the authors proposed to use genetic algorithm to determine the initial
values of the clusters. They use k-means algorithm to evaluate fitness of each candidate center.
At the final iteration, the fittest chromosome will be the k-means initial cluster center. As k-
means algorithm must be executed for many times, the approach is usually computationally
expensive.
These above mentioned methods share the common drawback of searching initial
centers while fixing a given number of clusters. Their aims was to find results in fewer number of
steps, but because of intensively calculating neighbourhood distance between data points and
rearranging them, it would be very slow in partitioning large datasets. When searching for the
 ISSN: 2089-3272
IJEEI Vol. 5, No. 3, September 2017 : 275 – 284
282
initial centroids, they evaluate the concerned criteria separately for different values of k, leading
to exeute K-means algorithm for different numbers of clusters. Such a process is usually very
time-consuming.
5. Experimental results
To evaluate our approach and test its efficiency, we compare it with k-means, and the
closest work to ours of [9], using two selected datasets from UCI repository of machine learning
database, which are Iris and Lymphoma datasets. It is worth noting that the first difference of
our approach and the two above mentioned methods is that we automatically determine the k
number of clusters and their initial centroids. In the following, we show that our approach
outperforms at least such two methods when considering two important parameters, that is, the
average error and the average execution time. Experiments have been done usin parameters of
the Table 1.
Table 1.
Parameter Value
Maximum number of clusters, Kmax 15
Population size, P 80
Crossover probability 0.6
Mutation probability 0.3
Maximum no. of iterations, 100
In the Iris dataset, each data point has four feature values, which represents the length
and the width of sepal and petal, in centimeters. It has three classes with 50 samples per class.
The value of k is therefore chosen to be three (clusters).
In the Lymphoma dataset, we find 62 samples consisting of 4026 genes spanning three
classes, which include 42 Diffuse Large B-Cell Lymphoma samples, nine Follicular Lymphoma
samples, and 11 B-cell Chronic Lymphocytic Leukemia samples
This dataset is to be partitioned into three clusters. Each dataset was use for each
method for 10 times and then we determined the average time and error as mentioned in Table
2, which shows the efficiently experiments of the proposed method over K-means and the work
of [9], which closest to ours
Table 2. Evaluation results using UCI data sets
Dataset parameters k-means [11]
Proposed
approach
Iris Avg Error 36.53 29.59 19.59
Avg Time 18.34 10.56 9.34
Lymphoma Avg Error 44.12 38.49 22.15
Avg Time 16.15 10.25 8.28
The experiments have been conducted to measure the average error and average
execution time parameters for the three methods. For each dataset, average error and average
time are listed to show the trade-off between them. From the table 2 it is observed that, our
method outperforms k-means and its variants, especially the work in [11], by enhancing the
average error and the average execution time.
IJEEI ISSN: 2089-3272 
An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi)
283
Because of our proposed population model consists of chromosomes with varied
structures, and using improved genetic operators for reproduction, the offsprings will inherit
appropriate information from their parent, leading to find an enhanced solution for the
concerned problem. Moreover, the deterministic determination of important portion of the initial
population will brings high diversity the genetic reproduction process, leading to enhanced
quality of the result
5. Conclusion
In this paper we have introduced a variable structure of genetic algorithms to be applied
on the data clustering problem without requiring the number and initial centers of clusters. The
cluster centroids are calculated by a genetic algorithm with heterogeneous population leading to
better results than using random numbers. Moreover, our approach allowed acceleration of the
search process by reducing the average execution time while obtaining better quality of data
partitions. As a future works, we plan to investigate improved fitness function and other
reproduction operators to achieve better performance
References
[1] Mecca GS, Raunich A, Pappalardo. New Algorithm For Clustering Search Results. Data And
Knowledge Engineering. 2007; 62: 504-522.
[2] H He, Y Tan. A Two-Stage Genetic Algorithm for Automatic Clustering. Neuro-Computing. 2012; 81:
49-59.
[3] C Raposo, C Antunes, J Barreto. Automatic Clustering Using A Genetic Algorithm With New Solution
Encoding And Operators. ICCSA (2). Lecture Notes In Computer Science, Springer. 2004; 8580: 92-
103.
[4] K Jain, MN Murty, PJ Flynn. Data Clustering: A Review, ACM Comput. Surv. 1999; 31(3): 264–323.
[5] S Garc a, uengo, Herrera. Data Pre-Processing In Data Mining, Cham. Springer. 2004: 327.
[6] SN Sivanandam, SN Deepa. Introduction To Genetic Algorithms. Berlin Heidelberg New York:
Springer. 2008.
[7] K Fukunaga. Introduction To Statistical Pattern Recognition. New York Academic. 1990.
[8] KB Sawant. Efficient Determination of Clusters in K-Mean Algorithm Using Neighborhood Distance.
The International Journal of Emerging Engineering Research and Technology. 2015; 3(1): 22-27.
[9] V Chittu, N Sumathi. A Modified Genetic Algorithm Initializing K-Means Clustering. Global Journal of
Computer Science And Technology. 2011; 11(2).
[10] K Fukunaga. Introduction To Statistical Pattern Recognition. Newyork: Academic. 1990.
[11] DW Scott. Multivariate Density Estimation. Wiley, New York. 1992.
[12] G Hamerly, C Elkan. Learning the k in k-means. Advances in Neural Information Processing
Systems. 2004; 17: 281–288.
[13] G Karegowda, VT Shama, MA Jayaram, AS Manjunath. Improving Performance of K-Means
Clustering by Initializing Cluster Centers Using Genetic Algorithm and Entropy Based Fuzzy
Clustering for Categorization of Diabetic Patients. In Proceedings of International Conference on
Advances in Computing. MSRIT, Bangalore: Springer India. 2013; 899-904.
[14] M Ettaouil, E Abdelatif, F Harchli. Improving The Performance of K-Means Algorithm Using An
Automatic Choice of Suitable Code Vectors And Optimal Number of Clusters. Journal of Theoretical
and Applied Information Technology. 2013; 56.
[15] Yugal Kumar, G Sahoo. A New Initialization Method To Originate Initial Cluster Centers For K-Means
Algorithm. International Journal of Advanced Science And Technology. 2014; 62: 43-54.
[16] SK Nandha Kumar, P Renuga. Reactive Power Planning Using Real GA Comparison With
Evolutionary Programming. Intl Journal of Recent Trends In Engineering. 2009; 1(3).
[17] K Krishna, MN Murty. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and
Cybernetics. 1999; 29(3): 433-439.
[18] G Delavar, GH Mohebpour. A New Genetic Center Based Data Clustering Algorithm Based on K-
Means. International Journal of Mechatronics, Electrical And Computer Technology. 2014; 4(13):
1820-1839.
 ISSN: 2089-3272
IJEEI Vol. 5, No. 3, September 2017 : 275 – 284
284
[19] K Valarmathi, D Devaraj, TK Radhakrishnan. Real-Coded Genetic Algorithm For System
Identification and Controller Tuning. Applied Mathematical Modelling. 2009; 33: 3392–3401.
[20] DE Goldberg. Genetic Algorithms In Search Optimization And Machine Learning, Addison Wesley.
1989.
[21] Najmah M, Ghasemzadeh, Ch Meinel. A Variant of Genetic Algorithms For Non-Homogeneous
Population. Intl Conf. on Applied Mathematics. Computational Science and Systems Engineering.
Taly-Rome. 2016.
[22] Asuncion J. Newman, UCI Mach. Learning Repository, Irvine, CA: Dpt Of Inf. And Univ. of California,
Http://Www.Ics.Uci.Edu/~Mlearn/Mlrepository.Html. Computer Science. 2016.
Ad

More Related Content

What's hot (19)

A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
I0704047054
I0704047054I0704047054
I0704047054
IJERD Editor
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
T180203125133
T180203125133T180203125133
T180203125133
IOSR Journals
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
IJECEIAES
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity Propagation
IJECEIAES
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
IJERA Editor
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
IOSR Journals
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...
IJECEIAES
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULEROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
IJCSEA Journal
 
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
ijscmcj
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
eSAT Publishing House
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
idescitation
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
ahmad abdelhafeez
 
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
sushantparte
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
ijistjournal
 
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...Applying the big bang-big crunch metaheuristic to large-sized operational pro...
Applying the big bang-big crunch metaheuristic to large-sized operational pro...
IJECEIAES
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity Propagation
IJECEIAES
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
IJERA Editor
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
IOSR Journals
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...
IJECEIAES
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULEROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
IJCSEA Journal
 
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
ijscmcj
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
eSAT Publishing House
 
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for ClusteringAutomatic Feature Subset Selection using Genetic Algorithm for Clustering
Automatic Feature Subset Selection using Genetic Algorithm for Clustering
idescitation
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
ahmad abdelhafeez
 
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...
sushantparte
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
cscpconf
 

Similar to An Heterogeneous Population-Based Genetic Algorithm for Data Clustering (20)

A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
ijcsa
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
Alexander Decker
 
I017235662
I017235662I017235662
I017235662
IOSR Journals
 
Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...
IJECEIAES
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
csandit
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
IJECEIAES
 
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
IJECEIAES
 
A comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray dataA comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray data
ijcsit
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
ijsc
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data Mining
Gina Rizzo
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Anders Viken
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
IJSRD
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
inventy
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based Method
IOSR Journals
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
ijcsity
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
IOSR Journals
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
IJECEIAES
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
ijcsa
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
Alexander Decker
 
Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...
IJECEIAES
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
csandit
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
IJECEIAES
 
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
IJECEIAES
 
A comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray dataA comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray data
ijcsit
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
ijsc
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data Mining
Gina Rizzo
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Anders Viken
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
IJSRD
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
inventy
 
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based MethodGene Selection for Sample Classification in Microarray: Clustering Based Method
Gene Selection for Sample Classification in Microarray: Clustering Based Method
IOSR Journals
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
ijcsity
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
IOSR Journals
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
IJECEIAES
 
Ad

More from ijeei-iaes (20)

Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
ijeei-iaes
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
ijeei-iaes
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distance
ijeei-iaes
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
ijeei-iaes
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
ijeei-iaes
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
ijeei-iaes
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
ijeei-iaes
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
ijeei-iaes
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumption
ijeei-iaes
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
ijeei-iaes
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classification
ijeei-iaes
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Graphene
ijeei-iaes
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stability
ijeei-iaes
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structure
ijeei-iaes
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagos
ijeei-iaes
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
ijeei-iaes
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...
ijeei-iaes
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...
ijeei-iaes
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detection
ijeei-iaes
 
Charge Sharing Suppression in Single Photon Processing Pixel Array
Charge Sharing Suppression in Single Photon Processing Pixel ArrayCharge Sharing Suppression in Single Photon Processing Pixel Array
Charge Sharing Suppression in Single Photon Processing Pixel Array
ijeei-iaes
 
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and ControlDevelopment of a Wireless Sensors Network for Greenhouse Monitoring and Control
Development of a Wireless Sensors Network for Greenhouse Monitoring and Control
ijeei-iaes
 
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best UpsurgeAnalysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
Analysis of Genetic Algorithm for Effective power Delivery and with Best Upsurge
ijeei-iaes
 
Design for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-DistanceDesign for Postplacement Mousing based on GSM in Long-Distance
Design for Postplacement Mousing based on GSM in Long-Distance
ijeei-iaes
 
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
Investigation of TTMC-SVPWM Strategies for Diode Clamped and Cascaded H-bridg...
ijeei-iaes
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
ijeei-iaes
 
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A ReviewMitigation of Power Quality Problems Using Custom Power Devices: A Review
Mitigation of Power Quality Problems Using Custom Power Devices: A Review
ijeei-iaes
 
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
Comparison of Dynamic Stability Response of A SMIB with PI and Fuzzy Controll...
ijeei-iaes
 
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
Embellished Particle Swarm Optimization Algorithm for Solving Reactive Power ...
ijeei-iaes
 
Intelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy ConsumptionIntelligent Management on the Home Consumers with Zero Energy Consumption
Intelligent Management on the Home Consumers with Zero Energy Consumption
ijeei-iaes
 
Analysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic ToolsAnalysing Transportation Data with Open Source Big Data Analytic Tools
Analysing Transportation Data with Open Source Big Data Analytic Tools
ijeei-iaes
 
A Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur ClassificationA Pattern Classification Based approach for Blur Classification
A Pattern Classification Based approach for Blur Classification
ijeei-iaes
 
Computing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of GrapheneComputing Some Degree-Based Topological Indices of Graphene
Computing Some Degree-Based Topological Indices of Graphene
ijeei-iaes
 
A Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine StabilityA Lyapunov Based Approach to Enchance Wind Turbine Stability
A Lyapunov Based Approach to Enchance Wind Turbine Stability
ijeei-iaes
 
Fuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane StructureFuzzy Control of a Large Crane Structure
Fuzzy Control of a Large Crane Structure
ijeei-iaes
 
Site Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for LagosSite Diversity Technique Application on Rain Attenuation for Lagos
Site Diversity Technique Application on Rain Attenuation for Lagos
ijeei-iaes
 
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
Impact of Next Generation Cognitive Radio Network on the Wireless Green Eco s...
ijeei-iaes
 
Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...Music Recommendation System with User-based and Item-based Collaborative Filt...
Music Recommendation System with User-based and Item-based Collaborative Filt...
ijeei-iaes
 
A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...A Real-Time Implementation of Moving Object Action Recognition System Based o...
A Real-Time Implementation of Moving Object Action Recognition System Based o...
ijeei-iaes
 
Wireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation DetectionWireless Sensor Network for Radiation Detection
Wireless Sensor Network for Radiation Detection
ijeei-iaes
 
Charge Sharing Suppression in Single Photon Processing Pixel Array
Charge Sharing Suppression in Single Photon Processing Pixel ArrayCharge Sharing Suppression in Single Photon Processing Pixel Array
Charge Sharing Suppression in Single Photon Processing Pixel Array
ijeei-iaes
 
Ad

Recently uploaded (20)

sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Journal of Soft Computing in Civil Engineering
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Autodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User InterfaceAutodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User Interface
Atif Razi
 
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
Agents chapter of Artificial intelligence
Agents chapter of Artificial intelligenceAgents chapter of Artificial intelligence
Agents chapter of Artificial intelligence
DebdeepMukherjee9
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Autodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User InterfaceAutodesk Fusion 2025 Tutorial: User Interface
Autodesk Fusion 2025 Tutorial: User Interface
Atif Razi
 

An Heterogeneous Population-Based Genetic Algorithm for Data Clustering

  • 1. Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol. 5, No. 3, September 2017, pp. 275~284 ISSN: 2089-3272, DOI: 10.11591/ijeei.v5i3.299  275 Received March 25, 2017; Revised July 10, 2017; Accepted August 11, 2017 An Heterogeneous Population-Based Genetic Algorithm for Data Clustering Amina Bedboudi *1 , Cherif Bouras 2 , Mohamed Tahar Kimour 3 1,3 Departement of Computer Science, University of Badji Mokhtar - Annaba, Algeria 2 Departement of Mathematics, University of Badji Mokhtar - Annaba, Algeria e-mail: bedboudi.amina@hotmail.fr Abstract As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches Keywords: data mining, clustering, K-means, genetic algorithms, crossover 1. Introduction As one of the most important tasks of spatial data mining, cluster analysis has been widely used in several domains, such as biology, system engineering and social sciences, in order to identify natural groups in large amounts of data [1-3]. A clustering algorithm assigns a large number of data points to a smaller number of groups (or clusters) such that data points in the same group share the same properties (similar) while, in different groups, they are dissimilar [4]. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis [5]. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique computational requirements on relevant clustering algorithms. A variety of algorithms have recently emerged that meet these requirements and were successfully applied to real-life data mining problems. Especially, popular methods such as K- means, genetic algorithms and their variants are the most notable ones for data clustering [1-9], [14-21]. Their advantages lie in the simplicity, efficiency and ease convergenc, however, most of them suffer from the drawbacks of requiring the number of clusters and their initial centers, but also from the poor clustering quality [9]. In this paper we present an approach to automatically generate such parameters, while improving the speed and accuracy of the algorithm. It is based on an appropriate data structure and process that handles heterogeneous populations in a modified genetic algorithm that operates on varied individual structures .Generally, using parameters such as the crossover and mutation probabilities that adapt to the evolution of the algorithm is a good choice, since higher diversity in the population can be achieved, preventing the algorithm to fall in local minima. Moreover, to accelerate the genetic algorithm process and increase the individual diversity of the initial population, we generate that initial population in two manners: the first subpopulation is obtained with a deterministic way and the second one with a random way. Increasing the population diversity will allow to achieve better quality. The rest of the article is structured as follows: section 2 gives an analysis of two data clustering techniques; k-means and genetic algorithms. Section 3 details our data clustering
  • 2.  ISSN: 2089-3272 IJEEI Vol. 5, No. 3, September 2017 : 275 – 284 276 approach by describing the proposed structure of chromosomes and genetic operators. Section 4 presents the related works and a comparaison with our aproposed approach. Section 5 describes the experimental results and discusses the strength of our approach. Finally, we draw a conclusion and define some future works in section 6. 2. Clustering Clustering can be thought of as data partitioning or segmenting into homogeneous groups. The clustering is usually accomplished by determining the similarity among the data on predefined attributes. The most similar data are grouped into clusters [2-5], [9]. There is a close relationship between clustering techniques and many other disciplines. Clustering has always been used in statistics and science [7]. Typical applications include speech and character recognition. viewed as a density estimation problem, machine learning clustering algorithms were applied to image segmentation and computer vision [10-13]. In the literature, various methods have been used to handle the clustering problem. The K-means [4-5] [17] is considered one of the major algorithms widely used in clustering [1]. Genetic algorithms [6] are also used in clustering, either in a separated way or combined with-the k-means algorithm [9]. 2.1. K-means K-means first randomly select the k objects; each object initially represents a cluster center. For each remaining object according to its distance from the center of each cluster, assign it to the nearest cluster. Afterward, each cluster center is replaced by the average value on the respective cluster. This process is repeated, until k centers do not change. K-means algorithm attempts to provide a partition of the given data such that the centroids of provided classes are the minima of the following objective function [8-9], [13], [20] : ∑ ∑ ‖ ‖ The typical algorithm is as follows [1] 1. Select k objects as initial centers; 2. Assign each data object to the closest center; 3. Recalculate the centers of each cluster; 4. Repeat steps 2 and 3 until centers do not change; Due to its practice properties, K-means algorithm is considered as the most popular technique to cluster information. Its main advantages are: i) easy to implement, ii) the comparison is conducted only between the observations and the center of classes, and iii) it detects and isolates the outliers. However, the algorithm exhibits some drawbacks, especially on the fact that it can be really slow since in each step the distance between each point to each cluster has to be calculated, which can be really expensive in the presence of a large dataset. Moreover, it is sensitive to the initial cluster centers that are provided by the user as well as the k number of clusters.
  • 3. IJEEI ISSN: 2089-3272  An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi) 277 2.2. Genetic algorithms Genetic algorithms [6] are robust, stochastic optimization algorithms, used to solve a wide variety of problems. They were developed with the goal of better understanding natural processes such as adaptation, and it belongs to a type of search techniques that mimic the principles of natural selection to develop solutions of large optimization problems. A genetic algorithm finds the optimal value for a particular objective function depending on the problem to be solved. The standard approach to an optimization problem begins by designing an objective function that can model the problem’s objectives while incorporating any constraints. A genetic algorithm consists of: Chromosomal Representation, initial population, fitness evaluation, selection and reproduction (i.e., crossover and mutation). GA operates by maintaining and manipulating a population of potential solutions called chromosomes. Each chromosome has an associated fitness value which is a qualitative measure of the goodness of the solution encoded in it. In a genetic algorithm process, we determine firstly an initial solution, composed of a set of chromosomes (initial population), and iteratively apply reproduction operators(selection, crossover and mutation) until achieving a certain quality parameter or a predefined number of iterations. A use of a fitness function guides the stochastic selection of chromosomes which are then used to generate new candidate solutions through crossover and mutation. Therefore, basic operations are: i) Crossover, which generates new chromosomes by combining sections of two or more selected parents, ii) mutation, which acts by randomly selecting genes which are then altered; thereby preventing suboptimal solutions from persisting and increases diversity in the population. There are three main types of selection methods: fitness proportionate selection, ranking method and tournament selection. In tournament selection [14], individuals are selected randomly from the population, based on the fitness function. Tournaments are often held between pairs of individuals, although larger tournaments can be used. Simple outline of a genetic algorithm is shown in Figure 1. Figure 1. Outline of the genetic algorithm.
  • 4.  ISSN: 2089-3272 IJEEI Vol. 5, No. 3, September 2017 : 275 – 284 278 Genetic algorithms are characterized by attributes such as objective function, encoding of the input data, crossover, mutation, and population size. (1) Objective function. It is used to assign each individual in the population a fitness value; an individual with a higher fitness represents a better solution to the problem than an individual with a lower fitness value; (2) Encoding. Genetic algorithms operate on an en-coding of the problem’s input data (which represent in-dependent variables for the objective function); (3) Elitism. This is a way to ensure that the highly fit-ting chromosomes are not lost and copied to the new population. Elitism has been found to be very important to the performance of genetic algorithms; (4) Crossover. It is a procedure in which a highly fit-ting chromosome is given an opportunity to reproduce by exchanging pieces of its genetic information with other highly fitting chromosomes; (5) Mutation. This is often applied after crossover by randomly altering some genes to individual parents; (6) Population size. It is the number of individuals in a population. The larger the population size, the better the chance that an optimal solution will be found. Genetic algorithms iterate a fixed number of times. Since the function’s upper bound (the maximum fitness value possible for an individual) may not be known or cannot be reached, we must limit the number of generations in order to guarantee the termination of the search process. This may result in a suboptimal solution. Moreover, combined approaches of genetic algorithms with other clustering techniques are still not satisfying on some user requirements especially for the accuracy and execution time 3. The Proposed Approach The aim of our approach is to give an efficient clustering process leading to better enhanced clusters’ quality and execution time using a varied structure-based genetic algorithm with a new crossover operator. Tthe input of such modified genetic algorithm process is a dataset. The genetic process automatically generates the number of clusters with their initial centroids 3.1. Chromosome representation Chromosome representation is a problem encoding performed at the most important steps in using genetic algorithm to solve a problem. To encode both the number of clusters and their centers, we propose a chromosome structure that applies a real coded genetic algorithm to the clustering problem, where crossover and mutation operators are applied directly to real parameter values. The use of real parameter values in the GA representation has a number of advantages over binary coding. The efficiency of the GA is increased as there is no need to convert the solution variables to the binary type, less memory is required, there is no loss in precision by discretization to binary or other values, and there is greater freedom to use different genetic operators. In this work, the existing population is processed in the form of several subpopulations with different size. In other words, the number of genes in every cluster differs. A chromosome in a subpopulations may contain 2 genes, 3 genes to Kmax genes, where Kmaxis the maximum number of clusters. These genes hold corresponding information about the cluster centers. Here, each gene in the chromosome represents cluster center, and number K of genes in a chromosome stands for number of clusters. The value of K is assumed to lie in the interval [2; Kmax]. Thus, each solution i is a fixed-length string represented by cluster centers cij ; j = 2,…,Kmax. Then, solution i of the population is represented via a vector as follows:
  • 5. IJEEI ISSN: 2089-3272  An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi) 279 ci1 … cij … ciKmax Figure.2. An example of chromosomes as defined by our approach. 3.2. Fitness function The fitness function f is a measure of profit we need during optimization. It is an objective function that is used to summarize how close a given solution is to achieving the aims and has an important effect on success of a genetic algorithm. Fitness is proportional to the utility or ability of individual which the chromosome represents. Measure of fitness helps in evolving good solutions and implementing natural selection. In this work, the fitness of a chromosome is computed using Mean Square Error (MSE) [18]. The MSE calculate the error between the clusters. MSE= ∑ (1) Fitness = After the definitions of the fitness function the different parameters of the genetic algorithm operators are fixed. 3.3. Generation of the Initial Population To accelerate the genetic algorithm process and increase the individual diversity of the initial population, we generate that initial population in two phases. The first phase consists in a deterministic producing manner, and the second one consists in a randomly generation. We proceed such generation so to obtain 20% of the initial population from the first phase and 80% from the second phase. his phase, we deterministically generate the first sub-population that constitutes 20% the entire population. In From the input dataset, we produce a sorted dataset. Then, the sorted dataset is divided into k equal segments and the means, modal, and min-max values (see Algorithm1) of each segment are determined and considered asinitial cluster centers. In doing so, we increase the diversity of individuals leading to enhanced genetic reproduction. Algorithm1: generate20%population  Sort the datset;  For (k=2, k<=Kmax-1, k++){ - Divide the sorted data int k equal segments: {Sk}; - Modal step: take the modal value of each segment k as a center Ck; - Means step: take the means value of each segment k as a center Ck; - Min-Max step: For (i=0, i<k, k++){ Calculate c = (max-min)/2 Consider c as a ith center 3.4. Reproduction Genetic Algorithms aids to look for the best solution among a number of possible solutions throw reproduction. Reproduction can be implemented in an algorithmic form, based on reproduction operators, suited to the encoding scheme.The objective of the reproduction operators is to ensure diversity in the population, such that the fittest solutions can be derived through the evolutionary process. Such evolution process is composed of crossover and mutation operations.
  • 6.  ISSN: 2089-3272 IJEEI Vol. 5, No. 3, September 2017 : 275 – 284 280 During such process, invalid chromosomes may be produced. In the proposed chromosome representation, repetition of genes produces invalid chromosomes because a data point cannot be center point of more than one cluster. So to be able to detect production of invalid chromosomes, we sort the chromosome genes in ascending order. In doing so, we aim to achieve small length, fast detection of invalid chromosomes, and faster crossover and mutation operations. Figure 3. illustrates our crossover process using a mask. Figure 3. The crossover process using a mask 3.4.1. Crossover It is a reproduction process that the children chromosomes are generated according to the fitness function values of their parents. We define a crossover operator in such a way that it could accept parent chromosomes with different number of genes. It occurs with probability crossp. For each two selected parents, crossover is applied, producing two offsprings. To this end, a binary mask is created. Its length is equal to that of the shorter parent. In this mask, digit “1” occurs with a certain probability p. C (1) j=C (1) j+α C (2) j (2) C (2) j=C (2) j+(1-α) C (1) j (3) Figure 4. Depicts the crossover formulas with α as a uniformly distributed random number such that α ∈[-1, 1]. Offspring O1 is created by applying the formula (1) and offspring O2 is created by applying the formula (2). These formulas in Figure 4 are applied on genes in the parents corresponding to the positions in the mask with a digit “1”. Also, we should check the validity of the resulting chromosomes to avoid redundant centers in one chromosome. In the case of redundancy, we replace the redundant gene with a non redundant one, extracted from a parent.
  • 7. IJEEI ISSN: 2089-3272  An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi) 281 Algorithm2: Clustering with GA Input: data points, kmax Output: clusters Procedure: (1) Use algorithm 1 to generate 20% of the initial population; (2) Randomly generate the remaining 80% of initial populations; (3) Calculate the fitness values of the chromosomes in the initial population; (4) Select the two fittest chromosomes (P1, P2) and randomly choose a target one P3 from the population; (5) Apply the crossover and mutation operators on (P1, P2) to generate two offsprings (O1, O2); (6) Calculate the fitness values of the offsprings (7) Replace P1 and P2 with O1 or O2 , if they are not better than such offsprings; (8) Repeat from step (3) if there is a significant change between generations, or the number of iteration has not been reached; (9) Final clusters are derived from the fittest chromosome using the one step K-means algorithm [21]. Figure 5. The main steps of the proposed genetic algorithm to clustering. 3.4.2. Mutation Mutation is intended to prevent falling of all solutions in the population into a local optimum of the solved problem or enhancing the obtained chromosome. Mutation takes place with a lower probability than that of the crossover. We define two types of mutation to be taken place after the crossover. Topological mutation is intended to add or delete genes from an offspring. Gene mutation is intended to modify a selected gene from an offspring. Gene mutation is applied by randomly selecting a gene from an offspring and replacing it with a data point randomly extract from a dataset. Afterward, a validity check is performed to avoid redundant centers and to keep the cluster number in [2, Kmax]. Our overall genetic process organizes operations as depicted by Figure 5. To find the closest cluster to every point, we apply a k-means operator (KMO), which is a one step of the classical k-means algorithm. In other words, we assign data points to their clusters for each new chromosome using the KMO operator, in order to compute its fitness. 4. Related works In the k-means algorithm, the most important challenge is related to determining of the number of clusters and their initial centers. Randomly determining initial centers does not guarantee good quality of clustering, due to the fact that at each run of K-Means algorithm on the same dataset may result in a different clusyers. Besides, the process may converge to suboptimal partitions. In [3], the author proposed to sort the dateset according to their distances from each other, and then divide the sorted data into k equal segments. He takes the first data item of each segment as a corresponding cluster center. In [12], the authors proposed a solution based on a statistical test, with the assymption that a subset of data follows a Gaussian distribution. The proposed process begins with a small number k of clusters, and then increments k at each step based on appropriate condition. In [17-21] genetic algorithm-based solutions are proposed. They maintain a population of some coded solutions and apply a one step k-means operator to calculate the fitness of a chromosome. In [13], the authors proposed to use genetic algorithm to determine the initial values of the clusters. They use k-means algorithm to evaluate fitness of each candidate center. At the final iteration, the fittest chromosome will be the k-means initial cluster center. As k- means algorithm must be executed for many times, the approach is usually computationally expensive. These above mentioned methods share the common drawback of searching initial centers while fixing a given number of clusters. Their aims was to find results in fewer number of steps, but because of intensively calculating neighbourhood distance between data points and rearranging them, it would be very slow in partitioning large datasets. When searching for the
  • 8.  ISSN: 2089-3272 IJEEI Vol. 5, No. 3, September 2017 : 275 – 284 282 initial centroids, they evaluate the concerned criteria separately for different values of k, leading to exeute K-means algorithm for different numbers of clusters. Such a process is usually very time-consuming. 5. Experimental results To evaluate our approach and test its efficiency, we compare it with k-means, and the closest work to ours of [9], using two selected datasets from UCI repository of machine learning database, which are Iris and Lymphoma datasets. It is worth noting that the first difference of our approach and the two above mentioned methods is that we automatically determine the k number of clusters and their initial centroids. In the following, we show that our approach outperforms at least such two methods when considering two important parameters, that is, the average error and the average execution time. Experiments have been done usin parameters of the Table 1. Table 1. Parameter Value Maximum number of clusters, Kmax 15 Population size, P 80 Crossover probability 0.6 Mutation probability 0.3 Maximum no. of iterations, 100 In the Iris dataset, each data point has four feature values, which represents the length and the width of sepal and petal, in centimeters. It has three classes with 50 samples per class. The value of k is therefore chosen to be three (clusters). In the Lymphoma dataset, we find 62 samples consisting of 4026 genes spanning three classes, which include 42 Diffuse Large B-Cell Lymphoma samples, nine Follicular Lymphoma samples, and 11 B-cell Chronic Lymphocytic Leukemia samples This dataset is to be partitioned into three clusters. Each dataset was use for each method for 10 times and then we determined the average time and error as mentioned in Table 2, which shows the efficiently experiments of the proposed method over K-means and the work of [9], which closest to ours Table 2. Evaluation results using UCI data sets Dataset parameters k-means [11] Proposed approach Iris Avg Error 36.53 29.59 19.59 Avg Time 18.34 10.56 9.34 Lymphoma Avg Error 44.12 38.49 22.15 Avg Time 16.15 10.25 8.28 The experiments have been conducted to measure the average error and average execution time parameters for the three methods. For each dataset, average error and average time are listed to show the trade-off between them. From the table 2 it is observed that, our method outperforms k-means and its variants, especially the work in [11], by enhancing the average error and the average execution time.
  • 9. IJEEI ISSN: 2089-3272  An Heterogeneous Population-Based Genetic Algorithm for Data… (Amina Bedboudi) 283 Because of our proposed population model consists of chromosomes with varied structures, and using improved genetic operators for reproduction, the offsprings will inherit appropriate information from their parent, leading to find an enhanced solution for the concerned problem. Moreover, the deterministic determination of important portion of the initial population will brings high diversity the genetic reproduction process, leading to enhanced quality of the result 5. Conclusion In this paper we have introduced a variable structure of genetic algorithms to be applied on the data clustering problem without requiring the number and initial centers of clusters. The cluster centroids are calculated by a genetic algorithm with heterogeneous population leading to better results than using random numbers. Moreover, our approach allowed acceleration of the search process by reducing the average execution time while obtaining better quality of data partitions. As a future works, we plan to investigate improved fitness function and other reproduction operators to achieve better performance References [1] Mecca GS, Raunich A, Pappalardo. New Algorithm For Clustering Search Results. Data And Knowledge Engineering. 2007; 62: 504-522. [2] H He, Y Tan. A Two-Stage Genetic Algorithm for Automatic Clustering. Neuro-Computing. 2012; 81: 49-59. [3] C Raposo, C Antunes, J Barreto. Automatic Clustering Using A Genetic Algorithm With New Solution Encoding And Operators. ICCSA (2). Lecture Notes In Computer Science, Springer. 2004; 8580: 92- 103. [4] K Jain, MN Murty, PJ Flynn. Data Clustering: A Review, ACM Comput. Surv. 1999; 31(3): 264–323. [5] S Garc a, uengo, Herrera. Data Pre-Processing In Data Mining, Cham. Springer. 2004: 327. [6] SN Sivanandam, SN Deepa. Introduction To Genetic Algorithms. Berlin Heidelberg New York: Springer. 2008. [7] K Fukunaga. Introduction To Statistical Pattern Recognition. New York Academic. 1990. [8] KB Sawant. Efficient Determination of Clusters in K-Mean Algorithm Using Neighborhood Distance. The International Journal of Emerging Engineering Research and Technology. 2015; 3(1): 22-27. [9] V Chittu, N Sumathi. A Modified Genetic Algorithm Initializing K-Means Clustering. Global Journal of Computer Science And Technology. 2011; 11(2). [10] K Fukunaga. Introduction To Statistical Pattern Recognition. Newyork: Academic. 1990. [11] DW Scott. Multivariate Density Estimation. Wiley, New York. 1992. [12] G Hamerly, C Elkan. Learning the k in k-means. Advances in Neural Information Processing Systems. 2004; 17: 281–288. [13] G Karegowda, VT Shama, MA Jayaram, AS Manjunath. Improving Performance of K-Means Clustering by Initializing Cluster Centers Using Genetic Algorithm and Entropy Based Fuzzy Clustering for Categorization of Diabetic Patients. In Proceedings of International Conference on Advances in Computing. MSRIT, Bangalore: Springer India. 2013; 899-904. [14] M Ettaouil, E Abdelatif, F Harchli. Improving The Performance of K-Means Algorithm Using An Automatic Choice of Suitable Code Vectors And Optimal Number of Clusters. Journal of Theoretical and Applied Information Technology. 2013; 56. [15] Yugal Kumar, G Sahoo. A New Initialization Method To Originate Initial Cluster Centers For K-Means Algorithm. International Journal of Advanced Science And Technology. 2014; 62: 43-54. [16] SK Nandha Kumar, P Renuga. Reactive Power Planning Using Real GA Comparison With Evolutionary Programming. Intl Journal of Recent Trends In Engineering. 2009; 1(3). [17] K Krishna, MN Murty. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics. 1999; 29(3): 433-439. [18] G Delavar, GH Mohebpour. A New Genetic Center Based Data Clustering Algorithm Based on K- Means. International Journal of Mechatronics, Electrical And Computer Technology. 2014; 4(13): 1820-1839.
  • 10.  ISSN: 2089-3272 IJEEI Vol. 5, No. 3, September 2017 : 275 – 284 284 [19] K Valarmathi, D Devaraj, TK Radhakrishnan. Real-Coded Genetic Algorithm For System Identification and Controller Tuning. Applied Mathematical Modelling. 2009; 33: 3392–3401. [20] DE Goldberg. Genetic Algorithms In Search Optimization And Machine Learning, Addison Wesley. 1989. [21] Najmah M, Ghasemzadeh, Ch Meinel. A Variant of Genetic Algorithms For Non-Homogeneous Population. Intl Conf. on Applied Mathematics. Computational Science and Systems Engineering. Taly-Rome. 2016. [22] Asuncion J. Newman, UCI Mach. Learning Repository, Irvine, CA: Dpt Of Inf. And Univ. of California, Http://Www.Ics.Uci.Edu/~Mlearn/Mlrepository.Html. Computer Science. 2016.
  翻译: