首页 > 解决方案 > 多元相关滤波器

问题描述

如何识别两个分类特征与目标变量之间的关联之间的相关性。

例如:

如果三个特征与我 2 个分类变量和 1 个目标变量在使用卡方检验识别每个特征与目标变量的相关性时,我无法找到强关系。所以我想使用这两个特征的组合并检查是否与目标变量存在相关性,但我很困惑我们是否可以对这种情况使用卡方检验或者可以使用其他一些方法?

例如:

ct_reloc_status = pd.crosstab(df_offer_details['percentage_hike_offered_bin'].sample(frac=0.5, replace=True, random_state=1),
    [df_offer_details['Candidate relocation status'].sample(frac=0.5, replace=True, random_state=1), 
            df_offer_details['Acceptance status'].sample(frac=0.5, replace=True, random_state=1)])
ct_reloc_status

# we carry out a contingency test to check whether there is a correlation with the target variable 
# and relocation status 
H0 = "There is no relationship between Relocation status and Acceptance status"
Ha = "There is a relationship between Relocation status and Acceptance status"

stat, p, dof, expected = chi2_contingency(ct_reloc_status)
print('p-value: ',p)

prob = 0.95
critical = chi2.ppf(prob, dof)
print('probability=%.3f, critical=%.3f, stat=%.3f' % (prob, critical, stat))

if abs(stat) >= critical :
    print(f'''Since p-value {p} < 0.05 we reject null hypothesis: {H0}.Thus alternate hypothesis: {Ha} holds good ''')
else:
    print(f'Fail to reject null hypothesis {H0}')

Result:


p-value:  0.019814129159194147
probability=0.950, critical=28.869, stat=32.380
Since p-value 0.019814129159194147 < 0.05 we reject the null hypothesis: There is no relationship between Relocation status and Acceptance status.Thus alternate hypothesis: There is a relationship between Relocation status and Acceptance status holds good 

但我不确定这是否是正确的方法

标签: pythonstatchi-squarededa

解决方案


推荐阅读