首页 > 解决方案 > 将列中的值合并为一列

问题描述

我有一个带有列的数据框,我想将以“答案”开头的数据框组合成一个名为“答案”的数据框。该列已经存在,但在第 3061 行附近没有更多值,我必须添加它们。到目前为止,我已经尝试过了:

columns_with_answer = [col for col in df if col.startswith('Answer')]
df['Answers']= df.columns_with_answer.tolist()

但我得到了:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-399b932d5740> in <module>()
      1 columns_with_answer = [col for col in df if col.startswith('Answer')]
----> 2 df['Answers']= df.columns_with_answer.tolist()

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'columns_with_answer'

因此,使用示例数据:

>>>import numpy as np
>>>df = pd.DataFrame({'A':list('abcdefg'),'B':[4,5,4,5,5,4, np.nan],'Answer1':['a','b','d',np.nan,'d',np.nan,'f'],'Answer2':['a','b','d','e','h','d','k'],'Answer3':['a','b',np.nan,'d','r',np.nan, 'l'],'F':list('aaabbbc'),'Answers':['truc', 'machin', 'bidule', np.nan,np.nan,np.nan,np.nan]  })
>>>df.head()
    A   B   Answer1 Answer2 Answer3 F   Answers
0   a   4   a   a   a   a   [truc]
1   b   5   b   b   b   a   [machin]
2   c   4   d   d   NaN a   [bidule]
3   d   5   NaN e   d   b   nan
4   e   5   d   h   r   b   nan

我想从第 3 行开始得到:

A   B   Answer1 Answer2 Answer3 F   Answers
0   a   4   a   a   a   a   [truc]
1   b   5   b   b   b   a   [machin]
2   c   4   d   d   NaN a   [bidule]
3   d   5   NaN e   d   b   [nan, e, d]
4   e   5   d   h   r   b   [d, h, r]

标签: python-3.xpandasdataframe

解决方案


按列表选择值[],然后在转换为列表之前转换为 numpy 数组:

df['Answers']= df[columns_with_answer].to_numpy().tolist()

或使用DataFrame.filterwithregex参数和^字符串开头:

df['Answers']= df.filter(regex='^Answer').to_numpy().tolist()

Answers编辑:如果列填充缺失值,则仅适用于行的解决方案:

columns_with_answer = [col for col in df if col.startswith('Answer') and col != 'Answers']
mask = df['Answers'].isna()
print (mask)
0    False
1    False
2    False
3     True
4     True
5     True
6     True
Name: Answers, dtype: bool

L = df.loc[mask, columns_with_answer].to_numpy().tolist()
df.loc[mask, 'Answers'] = pd.Series(L, index=df.index[mask])
print (df)
   A    B Answer1 Answer2 Answer3  F        Answers
0  a  4.0       a       a       a  a           truc
1  b  5.0       b       b       b  a         machin
2  c  4.0       d       d     NaN  a         bidule
3  d  5.0     NaN       e       d  b    [nan, e, d]
4  e  5.0       d       h       r  b      [d, h, r]
5  f  4.0     NaN       d     NaN  b  [nan, d, nan]
6  g  NaN       f       k       l  c      [f, k, l]



    

推荐阅读