BN structure score cross-check

Posted on Tue 02 August 2016 in score_based

After some doubts about the BN structure learning performance on a few test cases, I cross checked my structure scoring functions against the implementation from the bnlearn R package. After minor changes the implemented scores now yield the exact same results:

pgmpy:

~$ python
Python 3.5.2 (default, Jun 28 2016, 08:46:01)
[GCC 6.1.1 20160602] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> from pgmpy.models import BayesianModel
>>> from pgmpy.estimators import K2Score, BdeuScore, BicScore
>>>
>>> testdata = pd.DataFrame(data={'A': ["a", "b", "b", "b", "b", "b", "c", "d"],
...                               'B': ["f", "f", "f", "f", "f", "g", "h", "h"]})
>>>
>>> testmodel = BayesianModel([['A', 'B']])
>>> K2Score(testdata).score(testmodel)
-18.872853850025642
>>> BdeuScore(testdata, equivalent_sample_size=10).score(testmodel)
-18.623529328925628
>>> BdeuScore(testdata, equivalent_sample_size=25).score(testmodel)
-18.944492057037543
>>> BicScore(testdata).score(testmodel)
-22.527283368198223

bnlearn:

~$ R
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(bnlearn)
> A = c("a", "b", "b", "b", "b", "b", "c", "d")
> B = c("f", "f", "f", "f", "f", "g", "h", "h")
> testdata = data.frame(A,B)
>
> testmodel = empty.graph(names(testdata))
> modelstring(testmodel) = "[A][B|A]"
> score(testmodel, testdata, type = "k2")
[1] -18.87285
> score(testmodel, testdata, type = "bde", iss=10)
[1] -18.62353
> score(testmodel, testdata, type = "bde", iss=25)
[1] -18.94449
> score(testmodel, testdata, type = "bic")
[1] -22.52728

So at least this part is now somewhat validated.