Overview of Genetic Algorithms in Computer Science

CS 561 2
How do you find a solution in a large complex space?
• Ask an expert?
• Adapt existing designs?
• Trial and error?

CS 561 3
Example: Traveling Sales Person (TSP)
• Classic Example: You have N cities, find the shortest route such
that your salesperson will visit each city once and return.
• This problem is known to be NP-Hard
• As a new city is added to the problem, computation time in the classic
solution increases exponentially O(2n
) … (as far as we know)
Dallas
Houston
San Antonio
Austin
Mos Eisley
Is this the shortest path???
A Texas Sales Person

What if………
• Lets create a whole bunch of random sales people and see how
well they do and pick the best one(s).
• Salesperson A
• Houston -> Dallas -> Austin -> San Antonio -> Mos Eisely
• Distance Traveled 780 Km
• Salesperson B
• Houston -> Mos Eisley -> Austin -> San Antonio -> Dallas
• Distance Traveled 820 Km
• Salesperson A is better (more fit) than salesperson B
• Perhaps we would like sales people to be more like A and less like B
• Question:
• do we want to just keep picking random sales people like this and keep
testing them?
CS 561 4

We can get a little closer to the solution in polynomial time
• We might use a heuristic(s) to guide us in creating new sales
people
• So for instance, we might use the triangle inequality to help pick better potential
sales people.
• One can create an initial 2 approximation (at worst the distance is twice the
optimal distance) to TSP using a Nearest Neighbor or other similar efficient
polynomial time method.
• This detail is somewhat unimportant, you can use all kinds of heuristics to help
you create a better initial set of sales people [e.g. Match Twice and Stitch
(Kahng, Reda 2004)].
• Use some sort of incremental improvement make them better
successively.
• The idea is that you start with result(s) closer to where you think the solution is
than one would obtain at random so that the problem converges more quickly.
• Be careful since an initial approximation may be too close to a local extrema
which might actually slow down convergence or throw the solution off.
CS 561 5

However…………
• Sales person A is better than sales person B, but we can imagine
that is would be easy to create a sales person C who is even better.
• We don’t want to create 2n
sales people!
• This is a lecture about genetic algorithms (GA), <sarcasm> what
kind of solution will we use?</sarcasm>
• Should we try a genetic algorithm solution???
• Really? Are you sure? Maybe we should try something else
• It might be that you would prefer another solution
• I mean it might not be a bad idea
- You might learn something new
- However it might not be all the exciting
- I’m kind of not sure
- My mother suggested that I should do something else
- But at any rate I suppose you would like to get on with it
- Ok if you insist, but it’s all on your hands!
CS 561 6
Randomly inserted image
for no reason at all

CS 561 7
Represent problem like a DNA sequence
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Mos Eisely
San Antonio
Austin
Each DNA sequence is a
possible solution to the
problem.
DNA - Salesperson A
DNA - Salesperson B
The order of the cities in
the genes is the order of
the cities the TSP will take.

CS 561 8
Ranking by Fitness:
Here we’ve created three
different salespeople. We
then checked to see how
far each one has to travel.
This gives us a measure of
“Fitness”
Note: we need to be able to
measure fitness in polynomial
time, otherwise we are in
trouble.
Travels Shortest Distance

Let’s breed them!
• We have a population of traveling sales people. We also know their
fitness based on how long their trip is. We want to create more, but
we don’t want to create too many.
• We take the notion that the salespeople who perform better are
closer to the optimal salesperson than the ones which performed
more poorly. Could the optimal sales person be a “combination” of
the better sales people?
• We create a population of sales people as solutions to the problem.
• How do we actually mate a population of data???
CS 561 9

CS 561 10
Crossover:
Exchanging information through some part of information (representation)
Once we have found the best sales
people we will in a sense mate
them. We can do this in several
ways. Better sales people should
mate more often and poor sales
people should mate lest often.
Sales People City DNA
Parent 1 F A B | E C G D
Parent 2 D E A | C G B F
Child 1 F A B | C G B F
Child 2 D E A | E C G D
Sales person A (parent 1)
Sales person B (parent 2)
Sales person C (child 1)
Sales person D (child 2)

Crossover Bounds (Houston we have a problem)
• Not all crossed pairs are viable. We can only visit a city once.
• Different GA problems may have different bounds.
CS 561 11
San
Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Austin
San
Antonio
Mos Eisely
Dallas
Houston
Mos Eisely
Houston
Austin
San
Antonio
Dallas
Austin
San
Antonio
Mos Eisely
Parents Children
Not Viable!!

TSP needs some special rules for crossover
• Many GA problems also need special crossover rules.
• Since each genetic sequence contains all the cities in the travel,
crossover is a swapping of travel order.
• Remember that crossover also needs to be efficient.
CS 561 12
San
Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Mos Eisely
Houston
San
Antonio
Austin
San
Antonio
Dallas
Houston
Austin
Mos Eisely
Parents Children
Viable 
Dallas
Houston
Austin
San
Antonio
Mos Eisely

What about local extrema?
• With just crossover breading, we are constrained to gene
sequences which are a cross product of our current population.
• Introduce random effects into our population.
• Mutation – Randomly twiddle the genes with some probability.
• Cataclysm – Kill off n% of your population and create fresh new
salespeople if it looks like you are reaching a local minimum.
• Annealing of Mating Pairs – Accept the mating of suboptimal pairs with
some probability.
• Etc…
CS 561 13

CS 561 14
In summation: The GA Cycle
Fitness
Selection
Crossover
Mutation
New Population

GA and TSP: the claims
• Can solve for over 3500 cities (still took over 1 CPU years).
• Maybe holds the record.
• Will get within 2% of the optimal solution.
• This means that it’s not a solution per se, but is an approximation.
CS 561 15

GA Discussion
• We can apply the GA solution to any problem where the we can
represent the problems solution (even very abstractly) as a string.
• We can create strings of:
• Digits
• Labels
• Pointers
• Code Blocks – This creates new programs from strung together blocks
of code. The key is to make sure the code can run.
• Whole Programs – Modules or complete programs can be strung
together in a series. We can also re-arrange the linkages between
programs.
• The last two are examples of Genetic Programming
CS 561 16

Things to consider
• How large is your population?
• A large population will take more time to run (you have to test each
member for fitness!).
• A large population will cover more bases at once.
• How do you select your initial population?
• You might create a population of approximate solutions. However, some
approximations might start you in the wrong position with too much
bias.
• How will you cross bread your population?
• You want to cross bread and select for your best specimens.
• Too strict: You will tend towards local minima
• Too lax: Your problem will converge slower
• How will you mutate your population?
• Too little: your problem will tend to get stuck in local minima
• Too much: your population will fill with noise and not settle.
CS 561 17

GA is a good no clue approach to problem solving
• GA is superb if:
• Your space is loaded with lots of weird bumps and local minima.
• GA tends to spread out and test a larger subset of your space than many
other types of learning/optimization algorithms.
• You don’t quite understand the underlying process of your problem
space.
• NO I DONT: What makes the stock market work??? Don’t know? Me neither!
Stock market prediction might thus be good for a GA.
• YES I DO: Want to make a program to predict people’s height from
personality factors? This might be a Gaussian process and a good candidate
for statistical methods which are more efficient.
• You have lots of processors
• GA’s parallelize very easily!
CS 561 18

Why not use GA?
• Creating generations of samples and cross breading them can be
resource intensive.
• Some problems may be better solved by a general gradient descent
method which uses less resource.
• However, resource-wise, GA is still quite efficient (no computation of
derivatives, etc).
• In general if you know the mathematics, shape or underlying
process of your problem space, there may be a better solution
designed for your specific need.
• Consider Kernel Based Learning and Support Vector Machines?
• Consider Neural Networks?
• Consider Traditional Polynomial Time Algorithms?
• Etc.
CS 561 19

Overview of Genetic Algorithms in Computer Science

Recommended

More Related Content

Similar to Overview of Genetic Algorithms in Computer Science (20)

Recently uploaded (20)

Overview of Genetic Algorithms in Computer Science

Editor's Notes