python - Python如何分组到合并的数据框?
问题描述
这是我现在的代码:
d = {}
for stage in ['doggo', 'floofer', 'puppo', 'pupper']:
#d[stage] =df.groupby([stage]).agg({'retweet_count': 'sum'})
d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)
它产生这个:
doggo floofer puppo pupper
None 1387471.0 1517639.0 1472697.0 1444766.0
doggo 159188.0 NaN NaN NaN
floofer NaN 29020.0 NaN NaN
puppo NaN NaN 73962.0 NaN
pupper NaN NaN NaN 101893.0
我真正想要制作的是:
doggo floofer puppo pupper
None 1387471.0 1517639.0 1472697.0 1444766.0
stage 159188.0 29020.0 73962.0 101893.0
有谁知道如何做到这一点?
解决方案
d = {}
# 1 - Put your stages in a list variable
stages = ['doggo', 'floofer', 'puppo', 'pupper']
for stage in stages:
d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)
print(stage_retweets)
# 2 - Create a column conditionally to detect if the index in stages list or not
# !! important !! make shure you have only one index level otherwise stage_retweets.index.isin(stages) won't work
stage_retweets['is_stage'] = np.where(stage_retweets.index.isin(stages), 'Stage', 'None')
print(stage_retweets)
# 3 - Groupby this new column
stage_retweets = stage_retweets.groupby('is_stage').sum().reset_index()
print(stage_retweets)
推荐阅读
- python - 文件未正确到达
- python - 另一个线程在等待 HTTP 响应时会执行另一个线程吗?
- jquery - jquery重置数组中的所有输入字段
- c++ - 选择排序时交换函数调用的次数和完成的交换次数是否相同?
- android - 地图类型 GoogleMap.MAP_TYPE_NONE 的 Google 地图背景颜色
- c++ - cudaMalloc() 是否将数组初始化为 0?
- swift - 如何在主线程(Swift)中等待异步 Firebase 调用
- python - 为什么我不能显示加载的腌制深度学习模型的分数?
- electron - 为什么 Select2 不会覆盖 Electron 中的正常选择框?
- python - Pandas str.replace 完全匹配重复字符