首页 > 解决方案 > 使用 PyMC3 的分层贝叶斯线性回归非常慢

问题描述

我正在尝试使用UCI 存储库中的成人数据集在逻辑回归的情况下编写一些代码来实现 HBM 。

我已经编写了代码,但是采样速度非常慢,每个样本大约 107 秒,甚至对于 64 个维度或特征。难道我做错了什么?

我附上代码以供参考。由于建议尝试加快速度,我还重新调整了数据,但无济于事。

我很感激任何反馈。

代码混合了此处此处所写的内容。

#re loading the dataset this time without converting the country into one-hot vector rather for hierarchical modeling
adult_df = pd.read_csv('adult.data', header=None, sep=', ', )

adult_df.columns = ["Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
    "MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
    "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"]


adult_df["Income"] = adult_df["Income"].map({ "<=50K": 0, ">50K": 1 })

adult_df.drop("CapitalGain", axis=1, inplace=True,)
adult_df.drop("CapitalLoss", axis=1, inplace=True,)

adult_df.Age = adult_df.Age.astype(float)
adult_df.fnlwgt = adult_df.fnlwgt.astype(float)
adult_df.EducationNum = adult_df.EducationNum.astype(float)
adult_df.HoursPerWeek = adult_df.HoursPerWeek.astype(float)


# dropping native country here!!
adult_df = pd.get_dummies(adult_df, columns=[
    "WorkClass", "Education", "MaritalStatus", "Occupation", "Relationship",
    "Race", "Gender",
])

standard_scaler_cols = ["Age", "fnlwgt", "EducationNum", "HoursPerWeek",]
other_cols = list(set(adult_df.columns) - set(standard_scaler_cols))
mapper = DataFrameMapper(
    [([col,], StandardScaler(),) for col in standard_scaler_cols] +
    [(col, None,) for col in other_cols]
)



le = preprocessing.LabelEncoder()
country_idx = le.fit_transform(adult_df['NativeCountry'])

pd.value_counts(pd.Series(y_all))
y_all = adult_df["Income"].values
adult_df.drop("Income", axis=1, inplace=True,)

adult_df.drop("NativeCountry", axis=1, inplace=True,)
n_countries = len(set(country_idx))
n_features = len(adult_df.columns) 

min_max_scaler = preprocessing.MinMaxScaler()

adult_df = min_max_scaler.fit_transform(adult_df)

X_train, X_test, y_train, y_test, country_idx_train, country_idx_test = train_test_split(adult_df, y_all, country_idx, train_size=0.1, test_size=0.25, stratify=y_all, random_state=rs)

with pm.Model() as multilevel_model:

    # Hyperiors for intercept      
    mu_theta = pm.MvNormal(name='mu_a', mu=np.zeros(n_features), cov=np.eye(n_features), shape=n_features)

   packed_L_theta = pm.LKJCholeskyCov('packed_L', n=n_features,
                                 eta=2., sd_dist=pm.HalfCauchy.dist(2.5))
    L_theta = pm.expand_packed_triangular(n_features, packed_L_theta)
    theta = pm.MvNormal(mu=mu_theta, name='mu_theta', chol=L_theta, shape=[n_countries, n_features])


    # Hyperiors for intercept (Comment 1)
    mu_b = pm.StudentT('mu_b', nu=3, mu=0., sd=1.0)
    sigma_b = pm.HalfNormal('sigma_b', sd=1.0)

    b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=[n_countries, 1])
    # Calculate predictions given values
    # for intercept and slope 
    yhat = pm.invlogit(b[country_idx_train] +  pm.math.dot(theta[country_idx_train], np.asarray(X_train).T))

    #Make predictions fit reality

    y = pm.Binomial('y', n=np.ones(y_train.shape[0]), p=yhat, observed=y_train)

标签: bayesianpythonmcmchierarchical-bayesianpymc

解决方案


您可能会在我们关于 pymc3 问题的讨论中取得更大的成功:https ://discourse.pymc.io/我邀请您将您的问题移到那里。

我要检查的第一件事是您的 Theano 是否正在针对 MKL 库进行编译,或者甚至可能使用 Python 模式。如果您通过 conda 安装了东西,那应该会给您 MKL,如果您使用的是 pip,它可能会更困难。http://deeplearning.net/software/theano/troubleshooting.html#test-blas


推荐阅读