python-3.x - 将列中的值合并为一列
问题描述
我有一个带有列的数据框,我想将以“答案”开头的数据框组合成一个名为“答案”的数据框。该列已经存在,但在第 3061 行附近没有更多值,我必须添加它们。到目前为止,我已经尝试过了:
columns_with_answer = [col for col in df if col.startswith('Answer')]
df['Answers']= df.columns_with_answer.tolist()
但我得到了:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-399b932d5740> in <module>()
1 columns_with_answer = [col for col in df if col.startswith('Answer')]
----> 2 df['Answers']= df.columns_with_answer.tolist()
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5273 return self[name]
-> 5274 return object.__getattribute__(self, name)
5275
5276 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'columns_with_answer'
因此,使用示例数据:
>>>import numpy as np
>>>df = pd.DataFrame({'A':list('abcdefg'),'B':[4,5,4,5,5,4, np.nan],'Answer1':['a','b','d',np.nan,'d',np.nan,'f'],'Answer2':['a','b','d','e','h','d','k'],'Answer3':['a','b',np.nan,'d','r',np.nan, 'l'],'F':list('aaabbbc'),'Answers':['truc', 'machin', 'bidule', np.nan,np.nan,np.nan,np.nan] })
>>>df.head()
A B Answer1 Answer2 Answer3 F Answers
0 a 4 a a a a [truc]
1 b 5 b b b a [machin]
2 c 4 d d NaN a [bidule]
3 d 5 NaN e d b nan
4 e 5 d h r b nan
我想从第 3 行开始得到:
A B Answer1 Answer2 Answer3 F Answers
0 a 4 a a a a [truc]
1 b 5 b b b a [machin]
2 c 4 d d NaN a [bidule]
3 d 5 NaN e d b [nan, e, d]
4 e 5 d h r b [d, h, r]
解决方案
按列表选择值[]
,然后在转换为列表之前转换为 numpy 数组:
df['Answers']= df[columns_with_answer].to_numpy().tolist()
或使用DataFrame.filter
withregex
参数和^
字符串开头:
df['Answers']= df.filter(regex='^Answer').to_numpy().tolist()
Answers
编辑:如果列填充缺失值,则仅适用于行的解决方案:
columns_with_answer = [col for col in df if col.startswith('Answer') and col != 'Answers']
mask = df['Answers'].isna()
print (mask)
0 False
1 False
2 False
3 True
4 True
5 True
6 True
Name: Answers, dtype: bool
L = df.loc[mask, columns_with_answer].to_numpy().tolist()
df.loc[mask, 'Answers'] = pd.Series(L, index=df.index[mask])
print (df)
A B Answer1 Answer2 Answer3 F Answers
0 a 4.0 a a a a truc
1 b 5.0 b b b a machin
2 c 4.0 d d NaN a bidule
3 d 5.0 NaN e d b [nan, e, d]
4 e 5.0 d h r b [d, h, r]
5 f 4.0 NaN d NaN b [nan, d, nan]
6 g NaN f k l c [f, k, l]
推荐阅读
- tensorflow.js - 如何在 TFJS hub 模型上使用 Model.save 功能?
- java - 如何使用 Around 方面包装所有存储库调用
- azure-devops-server - 在 Azure DevOps 服务器(TFS on Prem)中,有没有办法将变更集直接链接到超链接?
- c - 我该如何处理这个分段错误
- struct - 消除方法中的双重可变借用
- c++ - 反汇编 Visual Studio
- r - 在不同的列中绘制具有年份的时间序列
- c - 将硒与 C 一起使用
- azure - 吞吐量单元和分区计数
- reactjs - 我应该将反应过渡组样式类放在 scss 中的哪个位置?