首页 > 解决方案 > 如何在石榴中加入先验信息?换句话说:石榴支持增量学习吗?

问题描述

pomegranate假设我使用当时可用的数据拟合模型。一旦有更多数据进入,我想相应地更新模型。换句话说,是否可以在pomegranate不覆盖先前参数的情况下用新数据更新现有模型?需要明确的是:我不是指核外学习,因为我的问题与在不同时间点可用的数据有关,而不是在单个时间点提供过大的内存数据。

这是我尝试过的:

>>> from pomegranate.distributions import BetaDistribution

>>> # suppose a coin generated the following data, where 1 is head and 0 is tail
>>> data1 = [0, 0, 0, 1, 0, 1, 0, 1, 0, 0]

>>> # as usual, we fit a Beta distribution to infer the bias of the coin
>>> model = BetaDistribution(1, 1)
>>> model.summarize(data1)  # compute sufficient statistics

>>> # presume we have seen all the data available so far,
>>> # we can now estimate the parameters
>>> model.from_summaries()

>>> # this results in the following model (so far so good)
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        3.0,
        7.0
    ],
    "frozen" :false
}

>>> # now suppose the coin is flipped a few more times, getting the following data
>>> data2 = [0, 1, 0, 0, 1]

>>> # we would like to update the model parameters accordingly
>>> model.summarize(data2)

>>> # but this fits only data2, overriding the previous parameters
>>> model.from_summaries()
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        2.0,
        3.0
    ],
    "frozen" :false
}


>>> # however I want to get the result that corresponds to the following,
>>> # but ideally without having to "drag along" data1
>>> data3 = data1 + data2
>>> model.fit(data3)
>>> model  # this should be the final model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        5.0,
        10.0
    ],
    "frozen" :false
}

编辑

提出问题的另一种方式:是否pomegranate支持增量学习或在线学习?基本上,我正在寻找与scikit-learn's类似的东西partial_fit(),你可以在这里找到。

鉴于它pomegranate支持核心外学习,我觉得我忽略了一些东西。有什么帮助吗?

标签: pythonpomegranate

解决方案


实际上from_summaries这就是问题所在。在 Beta 发行版的情况下,它会:self.summaries = [0, 0]. 所有的from_summaries方法都是破坏性的。他们用分布中的参数替换摘要。摘要总是可以针对其他观察结果进行更新,而参数则不能。

我认为这是一个糟糕的设计。最好将它们视为观察值的累加器,并将参数作为派生的缓存值。

如果你这样做:

model = BetaDistribution(1, 1)
model.summarize(data1)
model.summarize(data2)
model.from_summaries()
model

您会发现它确实产生了与model.summarize(data1 + data2)使用过的结果相同的结果。


推荐阅读