python - 组合不同的列
问题描述
我有一个像这样的数据框:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
'vote':[5,4,5,1,10,1,9],
'doggo': [None,"doggo",None,None,"doggo",None,None],
'floofer': ["floofer",None,None,"floofer",None,None,None],
'pupper': [None,None,"pupper",None,None,None,None],
'puppo':[None,None,None,None,None,None,"puppo"]})
我想合并最后 4 列并生成:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
'vote':[5,4,5,1,10,1,9],
'categories': ["floofer","doggo","pupper","floofer","doggo",None,"puppo"]})
任何指导表示赞赏。
解决方案
如果每一行每个分类列只有一个非None
值,则解决方案:
cols = ['doggo','floofer','pupper','puppo']
cols1 = df.columns.difference(cols)
df2 = df[cols1].join(df[cols].ffill(axis=1).iloc[:, -1].rename('Categories'))
print (df2)
id vote Categories
0 1 5 floofer
1 2 4 doggo
2 3 5 pupper
3 4 1 floofer
4 5 10 doggo
5 6 1 None
6 7 9 puppo
说明:
首先仅选择具有分类数据和前向填充缺失值的列 - 预期数据在最后一列:
print (df[cols].ffill(axis=1))
doggo floofer pupper puppo
0 None floofer floofer floofer
1 doggo doggo doggo doggo
2 None None pupper pupper
3 None floofer floofer floofer
4 doggo doggo doggo doggo
5 None None None None
6 None None None puppo
按位置选择最后一列:
print (df[cols].ffill(axis=1).iloc[:, -1])
0 floofer
1 doggo
2 pupper
3 floofer
4 doggo
5 None
6 puppo
Name: puppo, dtype: object
如果多个值的解决方案 - 数据是从分类列的列名创建的:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
'vote':[5,4,5,1,10,1,9],
'doggo': [None,"doggo1",None,"doggo2","doggo3",None,None],
'floofer': ["floofer1",None,None,"floofer2",None,None,None],
'pupper': [None,None,"pupper1",None,None,None,None],
'puppo':["puppo1",None,None,None,None,None,"puppo2"]})
print (df)
id vote doggo floofer pupper puppo
0 1 5 None floofer1 None puppo1
1 2 4 doggo1 None None None
2 3 5 None None pupper1 None
3 4 1 doggo2 floofer2 None None
4 5 10 doggo3 None None None
5 6 1 None None None None
6 7 9 None None None puppo2
s = (df[cols].notnull()
.dot(pd.Index(cols) + ', ')
.str.strip(', ')
.rename('Categories')
.replace('', np.nan)
)
df = df[cols1].join(s)
print (df)
id vote Categories
0 1 5 floofer, puppo
1 2 4 doggo
2 3 5 pupper
3 4 1 doggo, floofer
4 5 10 doggo
5 6 1 NaN
6 7 9 puppo
另一种解决方案,预期的输出不是来自列名:
s = pd.Series(df[cols].add(', ').fillna('').values.sum(axis=1),
index=df.index, name='Categories').str.strip(', ')
df = df[cols1].join(s)
print (df)
id vote Categories
0 1 5 floofer1, puppo1
1 2 4 doggo1
2 3 5 pupper1
3 4 1 doggo2, floofer2
4 5 10 doggo3
5 6 1
6 7 9 puppo2
推荐阅读
- php - 如何将我的 jquery 步骤表单提交到 PHP 文件?
- java - 使用 jdk.internal.net.http
- javascript - 如何使用 JavaScript 编辑表格内的 HTML
- html - 定位文字和图片?
- session - 使关闭按钮会话持久化
- php - 从 mysql 检索以填充表单时出现问题
- javascript - 在 JavaScript 中提升声明与表达式
- html - xsl 排序不适用于 xsl:apply-templates
- javascript - Node.js Express:TypeError:对象不是函数
- c - C - 在运行时重塑二维数组