An Algorithm for Bayesian Network Construction from Data

An Algorithm for Bayesian Network
Construction from Data
by: Jie Cheng
David A. Bell
Weiru Liu
University of Ulster, UK

Presented by: Jian Xu

Outline
• Introduction
• Some basic concepts
• The proposed algorithm for BN
construction
• Experiment results
• Discussions & comments

4/26/2010 Machine Learning 2

What is a Bayesian Network?
P(M)
Metastatic Cancer
.20
M P(S)
+ .80
- .20 M P(B)
+ .20
Serum Calcium Brain Tumor - .05

B P(H)
S B P(C)
+ .80
+ + .80
- .60
+ - .80
- + .80 Coma
Headaches
- - .05

Cancer BN Example

Bayesian Network (BN)
• A Bayesian network is a compact
graphical representation of a probability
distribution over a set of domain
random variables X = {X1, X2, …, Xn}
• Two components
– Structure: direct acyclic graph (DAG) over
nodes, which exploits causal relations in
the domain
– CPD: each node has a conditional
probability distribution associated with it


BN Learning
• Structure learning
– To identify the topology of the network
– Score based methods
– Dependency analysis methods
• Parameter learning
– To learn the conditional probabilities for a
given network topology
– MLE, Bayesian approach, etc


BN Structure Learning
• Search & scoring methods:
– To search for a structure most likely to have
generated the data
– Use heuristic search method to construct a model
and evaluate it using a scoring method, such as
MDL, Bayesian approach, etc
– May not find the best solution
– Random restarts: to avoid getting stuck in a local
maximum
– Less time complexity in the worst case, i.e., when
the underlying DAG is fully connected


BN Learning Algorithms (Cont’d)
• Dependency analysis methods:
– Use conditional independency (CI) test to analyze
dependency relationships among nodes.
– Usually asymptotically correct when the data is
DAG-faithful
– Works efficiently when the underlying network is
sparse
– CI tests with large condition sets may be
unreliable unless the volume of data is enormous.
– Used in this proposed algorithm


Basic Concepts
• D-separation: two nodes X and Y are called d-
separated given C if and only if there exists no
adjacency path P between X and Y, such that:
– every collider on P is in C or has a descendant in C
– no other nodes on path P is in C
– C is called a condition-set
• Open path: a path between X and Y is said to
be open if every node in the path is active.
• Closed path: if any node in the path is inactive
• Collider node
• Non-collider node

Basic Concepts (Cont’d)
• DAG-faithful: when there exists such a DAG that can
represent all the conditional independence relations of
the underlying distribution.
• D-map: a graph G is a dependency map (D-map) of M
if every independence relationship in M is true in G. (a
BN with no edge)
• I-map: a graph G is an independency map (I-map) of
M if every independence relationship in G is true in M.
(fully-connected BN)
• Minimum I-map: a graph G is an I-map of M, but the
removal of any arc from G yields a graph that is not an
I-map of M.
• P-map: a graph G is a perfect map of M if it is both a
D-map and an I-map of M.

Mutual Information
• The mutual information of two nodes Xi ,
Xj is defined as:

• The conditional mutual information is
defined as:


Assumptions
• All attributes are discrete
• No missing values in any record
• All the records are drawn from a single
probability model independently
• The size of dataset is big enough for
reliable CI tests
• The ordering of the attributes are
available before the network
construction

An Algorithm for BN Construction
• Drafting
– Compute mutual information of each pair
of nodes, and creates a draft of the model
• Thickening
– Adds arcs when the pairs of nodes cannot
be d-separated, get an I-map of the model
• Thinning
– Each arc of the I-map is examined using CI
tests and will be removed if the two nodes
are the arc are conditionally independent

Drafting Phase
1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two
empty lists S, R
2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the
I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of
nodes into an ordered set S.
3. Get the first two pairs of nodes in S, and remove them from S. Add
the Corresponding arc to E. (the direction of the arcs is
determined by the available node ordering)
4. Get the first pair of nodes remained in S and remove it from S. If
there is no open path between the two nodes (they are d-
separated given empty set), add the corresponding arc to E.
Otherwise add the pair of nodes to the end of an ordered set R.
5. Repeat step 4 until S is empty.


Drafting Example
• Figure (a) is the
underlying BN structure
• I(B,D) ≥ I(C,E) ≥ I(B,E)
≥ I(A,B) ≥ I(B,C) ≥
I(C,D) ≥ I(D,E) ≥ I(A,D)
≥ I(A,E) ≥ I(A,C) ≥ ε
• Figure (b) is the draft
graph


Thickening Phase

6. Get the first pair of nodes in R and remove it
from R
7. Find a block set that blocks each open path
between these nodes by a set of minimum
number of nodes. Conduct a CI test, if these two
nodes are still dependent on each other given
the block set, connect them by an arc.
8. Go to step 6 until R is empty.


Thickening Example
• Figure (b) is the draft
graph
• Examine (D,E) pair, find
the minimum set that
blocks all the open paths
between D and E {B}
• CI test reveal that D and E
are dependent given {B},
so arc (D,E) is added
• (A,C) is not added because
A and C are independent
given {B}


Thinning Phase

9. For each arc in E, if there are open paths
between the two nodes besides this arc,
remove this arc from E temporarily, and
call procedure find_block_set(current
graph, node1, node2). Conduct a CI test
on the condition of the block set. If the
two nodes are dependent, add this arc
back to E; otherwise remove the arc
permanently.


Thinning Example
• Figure (c) is the I-map
of the underlying BN
• Arc (B,E) is removed
because B and E are
independent of each
other given {C,D}.
• Figure (d) is the
perfect I-map of the
underlying dependency
model (a).


Finding Minimum Block Set


Complexity Analysis
• For a dataset with N attributes, r
maximum possible values each, k
parents at most
– Phase I: N2 mutual information
computation, each of which requires O(r2)
basic operations, O(N2r2)
– Phase II: at most N2 CI tests, each with at
most O(rk+2) basic operations, O(N2rk+2),
worst case O(N2rN)
– Phase III: same as Phase II.


ALARM Network Structure


Experiment setup
• ALARM BN (A Logical Alarm Reduction
Mechanism): a medical diagnosis system for
patient monitoring
– 37 nodes, 46 arcs
– 3 versions: same structure, different CPD’s
• 10000 cases for each dataset
• Modified conditional mutual information
calculation by taking the variable’s degree of
freedom into consideration to make CI tests
more reliable
• ε = 0.003

Result on ALARM BN


Discussions & Comments
• About the assumptions
– All attributes are discrete
– No missing values in any record
– The size of dataset is big enough for
reliable CI tests
– The ordering of the attributes are available
before the network construction


Discussions & Comments
• Threshold ε
– ε = 0.003
– How do we pick an appropriate ε?
– How does it affect the accuracy and time by
choosing different ε?
• Modification in the experiment part
– Use Modified conditional mutual information
calculation by taking the variable’s degree of
freedom into consideration to make CI tests more
reliable
– Does this modification affect the result in any way
other than increasing the accuracy?

An Algorithm for Bayesian Network Construction from Data

Recommended

More Related Content

What's hot (18)

Viewers also liked (10)

Similar to An Algorithm for Bayesian Network Construction from Data (20)

More from butest (20)

An Algorithm for Bayesian Network Construction from Data