Score-based Structure Learning BNs

Posted on Fri 24 June 2016 in score_based

With a bit of delay, I am now working on a basic PR for score-based structure estimation for Bayesian Networks. It comes with two ingredients:

The StructureScore-Class and its subclasses BayesianScore and BICScore. They are initialized with a data set and provide a score-method to compute how well a given BayesianModel can be fitted to the data, according to different criteria. Since those scores are decomposable for BNs, a local_score-method is also exposed for node-by-node computation.
The StructureSearch-Class and its subclasses ExhaustiveSearch and HCSearch. They are initialized with a StructureScore-instance and optimize that score over all BayesianModels. The latter subclass has a number of optional search enhancements.

So far BayesianScore supports BDeu and K2 priors, I’ll think for a good interface to specify other prior weights. With K2 priors the score is given by the following form:

$$score^{K2}_D(m) = \log(P(m)) + \sum_{X\in nodes(m)} local\_score^{K2}_D(X, parents_m(X))$$

where $P(m)$ is an optional structure prior that is quite negligible in practice. $local\_score^{K2}$ is computed for each node as follows:

$$local\_score^{K2}_D(X, P_X) = \sum_{j=1}^{q(P_X)} (\log(\frac{(r-1)!}{(N_j+r-1)!}) + \sum_{k=1}^r \log(N_{jk}!))$$

Where $r$ is the cardinality of the variable $X$, $q(P_X)$ is the product of the cardinalities of the parents of $X$ (= the possible states of $P_X$) and $N_{jk}$ is the number of times that variable $X$ is in state $k$ while parents are in state $j$ in the data sample. Finally, $N_j:=\sum_{k=1}^r N_{jk}$.

PR will follow shortly.