Score-based Structure Learning BNs

Posted on Fri 24 June 2016 in score_based

With a bit of delay, I am now working on a basic PR for score-based structure estimation for Bayesian Networks. It comes with two ingredients:

  • The StructureScore-Class and its subclasses BayesianScore and BICScore. They are initialized with a data set and provide a score-method to compute how well a given BayesianModel can be fitted to the data, according to different criteria. Since those scores are decomposable for BNs, a local_score-method is also exposed for node-by-node computation.
  • The StructureSearch-Class and its subclasses ExhaustiveSearch and HCSearch. They are initialized with a StructureScore-instance and optimize that score over all BayesianModels. The latter subclass has a number of optional search enhancements.

So far BayesianScore supports BDeu and K2 priors, I’ll think for a good interface to specify other prior weights. With K2 priors the score is given by the following form:

$$score^{K2}_D(m) = \log(P(m)) + \sum_{X\in nodes(m)} local\_score^{K2}_D(X, parents_m(X))$$

where \(P(m)\) is an optional structure prior that is quite negligible in practice. \(local\_score^{K2}\) is computed for each node as follows:

$$local\_score^{K2}_D(X, P_X) = \sum_{j=1}^{q(P_X)} (\log(\frac{(r-1)!}{(N_j+r-1)!}) + \sum_{k=1}^r \log(N_{jk}!))$$

Where \(r\) is the cardinality of the variable \(X\), \(q(P_X)\) is the product of the cardinalities of the parents of \(X\) (= the possible states of \(P_X\)) and \(N_{jk}\) is the number of times that variable \(X\) is in state \(k\) while parents are in state \(j\) in the data sample. Finally, \(N_j:=\sum_{k=1}^r N_{jk}\).

PR will follow shortly.