python - 根据先前列的集合创建多个新列(更有效)
问题描述
对于我的数据集,我想创建一些新列。这些列由一个比率组成,该比率基于另外两个列。这是我的意思的一个例子:
import random
col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col1' : col1, 'col2' : col2 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()
new_df['col1/col2'] = (new_df.groupby(level=0, group_keys=False)
.apply(lambda x: x.col1/x.col2.shift())
)
col1 col2 col1/col2
Unit Year
1 2014 0 0 NaN
2015 0 0 NaN
2016 0 0 NaN
2017 0 0 NaN
2 2015 2 4 NaN
2016 4 6 1.000000
2017 6 8 1.000000
3 2017 0 0 NaN
4 2014 0 0 NaN
5 2015 0 0 NaN
6 2014 100 200 NaN
2015 200 900 1.000000
2016 300 400 0.333333
2017 400 500 1.000000
但是,这是一个超级简化的 df。实际上,我有 1 到 50 列。我现在做的事情感觉超级低效:
col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col3=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col4=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col5=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
col6=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
# data in all cols is the same, just for example.
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col1' : col1, 'col2' : col2, 'col3' : col3, 'col4' : col4, 'col5' : col5, 'col6' : col6}
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()
new_df['col1/col2'] = (new_df.groupby(level=0, group_keys=False)
.apply(lambda x: x.col1/x.col2.shift())
)
new_df['col3/col4'] = (new_df.groupby(level=0, group_keys=False)
.apply(lambda x: x.col3/x.col4.shift())
)
new_df['col5/col6'] = (new_df.groupby(level=0, group_keys=False)
.apply(lambda x: x.col5/x.col6.shift())
)
我做了 25 次创建新列的方法。这可以做得更有效/
先感谢您,
仁
解决方案
想法被DataFrameGroupBy.shift
列表中的所有列使用cols2
,并按列表过滤 DataFrame cols1
:
col1=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col2=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]
d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6],
'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017],
'col1' : col1, 'col2' : col2 ,
'col3' : col1, 'col4' : col2 ,
'col5' : col1, 'col6' : col2 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()
cols1 = ['col1','col3','col5']
cols2 = ['col2','col4','col6']
new_df = new_df[cols1] / new_df.groupby(level=0)[cols2].shift().values
new_df.columns = [f'{a}/{b}' for a, b in zip(cols1, cols2)]
print (new_df)
col1/col2 col3/col4 col5/col6
Unit Year
1 2014 NaN NaN NaN
2015 NaN NaN NaN
2016 NaN NaN NaN
2017 NaN NaN NaN
2 2015 NaN NaN NaN
2016 1.000000 1.000000 1.000000
2017 1.000000 1.000000 1.000000
3 2017 NaN NaN NaN
4 2014 NaN NaN NaN
5 2015 NaN NaN NaN
6 2014 NaN NaN NaN
2015 1.000000 1.000000 1.000000
2016 0.333333 0.333333 0.333333
2017 1.000000 1.000000 1.000000
推荐阅读
- javascript - 如何修复谷歌地图中的信息窗口
- node.js - 数据源中的数据更新时如何更新 Mat 表视图
- ios - 使用我的 MapView 和在我的 viewController 中使用 iOS 中的 Mapbox 制作自定义 NavigationViewController
- php - Laravel:在 vliew 中显示相关的 json 文件
- android - 从活动访问服务方法
- angular - 如何将 kendo-data-query groupBy 与 observables 一起使用
- c# - 如何使用 servicebusClient 访问 RabbitMQ?
- c# - 未知 8 位 OpenPop
- ios - 设置 setVisibleXRangeMaximum 时,iOS 图表 X 轴值无限重复
- php - 如何从数组中删除未知索引元素