python - pd.dataframe:如何计算变量的值并找到概率
问题描述
这是我的数据:
df1 = pd.DataFrame()
df1['a1'] = ['ABC','ACC','BCC','ABC','ABC','ACC','BCC']
df1['b1'] = ['ACC','AAC','BAC','ACC','ACC','AAC','BAC']
df1['group'] = ['A1','A2','A1','A3','A2','A1','A1']
df1['names'] = ['n1','n2','n3','n4','n1','n3','n3']
df2 = pd.DataFrame()
df2['a2'] = ['ACC','BCC','ABC']
df2['b2'] = ['AAC','BAC','ACC']
df2['types'] = ['t1','t2','t3']
DF = pd.merge(df1, df2, left_on=['a1','b1'], right_on=['a2','b2'])
>>> DF.sort_values('group')
a1 b1 group names a2 b2 types
0 ABC ACC A1 n1 ABC ACC t3
4 ACC AAC A1 n3 ACC AAC t1
5 BCC BAC A1 n3 BCC BAC t2
6 BCC BAC A1 n3 BCC BAC t2
2 ABC ACC A2 n1 ABC ACC t3
3 ACC AAC A2 n2 ACC AAC t1
1 ABC ACC A3 n4 ABC ACC t3
我想计算每个名称的总出现时间(df 的 nrow)中每种类型出现的概率,然后对每个组求和。
例如,对于组A1
:
for n1:
P_1 = P(t1_n1)+P(t2_n1)+P(t3_n1) = 0+0+1/7 = 1/7
for n2:
P_2 = P(t1_n2)+P(t2_n2)+P(t3_n2) = 0
for n3:
P_3 = P(t1_n3)+P(t2_n3)+P(t3_n3) = 1/7+0+2/7 = 3/7
for n4:
P_4 = P(t1_n4)+P(t2_n4)+P(t3_n4) = 0
P_total = P_1+P_2+P_3+P_4
预期输出:
groups P_n1 P_n2 P_n3 P_n4 P_total
0 A1 1/7 0 3/7 0 4/7
1 A2 ....
2 A3
3 A4
如何在没有很多循环功能的情况下以一种优雅的方式完成我的目标?谢谢
解决方案
您可以将 pd.crosstab 与 normalize=True 一起使用:
pd.crosstab(DF['group'],DF['names'],normalize=True)
names n1 n2 n3 n4
group
A1 0.142857 0.000000 0.428571 0.000000
A2 0.142857 0.142857 0.000000 0.000000
A3 0.000000 0.000000 0.000000 0.142857
为您提供总数等:
pd.crosstab(DF['group'],DF['names'],normalize=True)\
.assign(total = lambda x : x.sum(axis=1)).reset_index()
names group n1 n2 n3 n4 total
0 A1 0.142857 0.000000 0.428571 0.000000 0.571429
1 A2 0.142857 0.142857 0.000000 0.000000 0.285714
2 A3 0.000000 0.000000 0.000000 0.142857 0.142857
推荐阅读
- python - 使用线程 python 终止脚本
- npm - 如何更新全局安装的 npm 包
- c++ - 使用聚合初始化器初始化类的模板(聚合类型)成员,但没有额外的括号
- spring - 多个 Spring Security 过滤器
- c# - 将多边形点保存到mysql数据库c#
- c - 第一次评估后始终具有相同值的 if 语句的优化
- python - counting specific weekday between two dates
- javascript - 如何在 React Native 中更新配置文件(displayName)firebase?
- java - 如何在 RecyclerAdapter 适配器中传递 findViewById
- regex - Swift - 使用正则表达式拆分字符串 - 忽略搜索字符串