next up previous contents
Next: Global score metric based Up: Bayesian Network Classifiers in Previous: Search algorithms   Contents

Conditional independence test based structure learning

Conditional independence tests in Weka are slightly different from the standard tests described in the literature. To test whether variables $x$ and $y$ are conditionally independent given a set of variables $Z$, a network structure with arrows $\forall_{z\in Z}z \to y$ is compared with one with arrows $\{x\to y\} \cup \forall_{z\in Z}z \to y$. A test is performed by using any of the score metrics described in Section 2.1.

\epsfig{file=images/ci.algorithms.eps,height=8cm}

At the moment, only the ICS [11]and CI algorithm are implemented.

The ICS algorithm makes two steps, first find a skeleton (the undirected graph with edges $iff$ there is an arrow in network structure) and second direct all the edges in the skeleton to get a DAG.

Starting with a complete undirected graph, we try to find conditional independencies $\langle x,y\vert Z\rangle$ in the data. For each pair of nodes $x$, $y$, we consider sets $Z$ starting with cardinality $0$, then $1$ up to a user defined maximum. Furthermore, the set $Z$ is a subset of nodes that are neighbors of both $x$ and $y$. If an independency is identified, the edge between $x$ and $y$ is removed from the skeleton.

The first step in directing arrows is to check for every configuration $x-z-y$ where $x$ and $y$ not connected in the skeleton whether $z$ is in the set $Z$ of variables that justified removing the link between $x$ and $y$ (cached in the first step). If $z$ is not in $Z$, we can assign direction $x\to z\leftarrow y$.

Finally, a set of graphical rules is applied [11] to direct the remaining arrows.

           Rule 1: i->j--k & i-/-k => j->k
           Rule 2: i->j->k & i--k => i->k
           Rule 3  m
                         /|\
                        i | k  => m->j
                i->j<-k  \|/
                          j
        
           Rule 4  m
                         / \
                        i---k  => i->m & k->m
                  i->j   \ /
                          j
           Rule 5: if no edges are directed then take a random one (first we can find)

The ICS algorithm comes with the following options.

\epsfig{file=images/ci.ics.eps,height=5cm}

Since the ICS algorithm is focused on recovering causal structure, instead of finding the optimal classifier, the Markov blanket correction can be made afterwards.

Specific options:
The maxCardinality option determines the largest subset of $Z$ to be considered in conditional independence tests $\langle x,y\vert Z\rangle$.
The scoreType option is used to select the scoring metric.


next up previous contents
Next: Global score metric based Up: Bayesian Network Classifiers in Previous: Search algorithms   Contents
Remco Bouckaert 2008-05-12