BN structure score cross-check
Posted on Tue 02 August 2016 in score_based
After some doubts about the BN structure learning performance on a few test cases, I cross checked my structure scoring functions against the implementation from the bnlearn R package. After minor changes the implemented scores now yield the exact same results:
pgmpy:
~$ python Python 3.5.2 (default, Jun 28 2016, 08:46:01) [GCC 6.1.1 20160602] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> from pgmpy.models import BayesianModel >>> from pgmpy.estimators import K2Score, BdeuScore, BicScore >>> >>> testdata = pd.DataFrame(data={'A': ["a", "b", "b", "b", "b", "b", "c", "d"], ... 'B': ["f", "f", "f", "f", "f", "g", "h", "h"]}) >>> >>> testmodel = BayesianModel([['A', 'B']]) >>> K2Score(testdata).score(testmodel) -18.872853850025642 >>> BdeuScore(testdata, equivalent_sample_size=10).score(testmodel) -18.623529328925628 >>> BdeuScore(testdata, equivalent_sample_size=25).score(testmodel) -18.944492057037543 >>> BicScore(testdata).score(testmodel) -22.527283368198223
bnlearn:
~$ R R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) > library(bnlearn) > A = c("a", "b", "b", "b", "b", "b", "c", "d") > B = c("f", "f", "f", "f", "f", "g", "h", "h") > testdata = data.frame(A,B) > > testmodel = empty.graph(names(testdata)) > modelstring(testmodel) = "[A][B|A]" > score(testmodel, testdata, type = "k2") [1] -18.87285 > score(testmodel, testdata, type = "bde", iss=10) [1] -18.62353 > score(testmodel, testdata, type = "bde", iss=25) [1] -18.94449 > score(testmodel, testdata, type = "bic") [1] -22.52728
So at least this part is now somewhat validated.