python - 新列的条件生成 - Pandas
问题描述
我正在尝试根据现有列的条件逻辑创建一个新列。我知道可能有更有效的方法来实现这一点,但我有一些条件需要包括在内。这只是第一步。
总体范围是创建两个从1
和映射的新列2
。这些被引用到Object
列,因为每个时间点我可以有多行。
Object2
并Value
确定如何映射新列。因此,如果Value is == X
,我想匹配两列以将该时间点Object
的对应1
和返回到新列。2
如果 . 应该发生相同的过程Value is == Y
。如果Value is == Z
,我想插入0, 0
。其他一切都应该是NaN
df = pd.DataFrame({
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
'Object' : ['B','A','A','A','C','C','C','B','B'],
'1' : [1,3,5,7,9,11,13,15,17],
'2' : [0,1,4,6,8,10,12,14,16],
'Object2' : ['A','A',np.nan,'C','C','C','C','B','A'],
'Value' : ['X','X',np.nan,'Y','Y','Y','Y','Z',np.nan],
})
def map_12(df):
for i in df['Value']:
if i == 'X':
df['A1'] = df['1']
df['A2'] = df['2']
elif i == 'Y':
df['A1'] = df['1']
df['A2'] = df['2']
elif i == 'Z':
df['A1'] = 0
df['A2'] = 0
else:
df['A1'] = np.nan
df['A2'] = np.nan
return df
预期输出:
Time Object 1 2 Object2 Value A1 A2
0 2019-08-02 09:50:10.1 A 1 0 A X 1.0 0.0 # Match A-A at this time point, so output is 1,0
1 2019-08-02 09:50:10.1 B 3 1 A X 1.0 0.0 # Still at same time point so use 1,0
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN # No Value so NaN
3 2019-08-02 09:50:10.3 C 7 6 C Y 7.0 6.0 # Match C-C at this time point, so output is 7,6
4 2019-08-02 09:50:10.3 A 9 8 C Y 7.0 6.0 # Still at same time point so use 7,6
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0 # Match C-C at this time point, so output is 11,10
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0 # Match C-C at this time point, so output is 13,12
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0 # Z so 0,0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN # No Value so NaN
新样本df:
df = pd.DataFrame({
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
'Object' : ['B','A','A','A','C','C','C','B','B'],
'1' : [1,3,5,7,9,11,13,15,17],
'2' : [0,1,4,6,8,10,12,14,16],
'Object2' : ['A','A',np.nan,'C','C','C','C','B','A'],
'Value' : ['X','X',np.nan,'Y','Y','Y','Y','Z',np.nan],
})
预期输出:
Time Object 1 2 Object2 Value A1 A2
0 2019-08-02 09:50:10.1 B 1 0 A X 3.0 1.0 # Match A-A at this time point, so output is 3,1
1 2019-08-02 09:50:10.1 A 3 1 A X 3.0 1.0 # Still at same time point so use 3,1
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN # No Value so NaN
3 2019-08-02 09:50:10.3 A 7 6 C Y 9.0 8.0 # Match C-C at this time point, so output is 9,8
4 2019-08-02 09:50:10.3 C 9 8 C Y 9.0 8.0 # Still at same time point so use 9,8
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0 # Match C-C at this time point, so output is 11,10
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0 # Match C-C at this time point, so output is 13,12
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0 # Z so 0,0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN # No Value so NaN
解决方案
使用DataFrame.where
+DataFrame.eq
创建一个类似于df[['1','2']]
但仅包含匹配项的行True
和其余行的 DataFrame NaN
。然后使用和()重合的现有值按时间点DataFrame.groupby
分组,并用现有值填充每组的缺失数据。用于在. 最后使用 [ ] 将值设置为 0 时在列中Object
Object2
matches==True
DataFrame.where
df['Value']
NaN
DataFrame.mask
Z
Value
#matches
matches=df.Object.eq(df.Object2)
#Creating conditions
condition_z=df['Value']=='Z'
not_null=df['Value'].notnull()
#Creating DataFrame to fill
df12=( df[['1','2']].where(matches)
.groupby(df['Time'],sort=False)
.apply(lambda x: x.ffill().bfill()) )
#fill 0 on Value is Z and discarting NaN
df[['A1','A2']] =df12.where(not_null).mask(condition_z,0)
print(df)
输出
Time Object 1 2 Object2 Value A1 A2
0 2019-08-02 09:50:10.1 B 1 0 A X 3.0 1.0
1 2019-08-02 09:50:10.1 A 3 1 A X 3.0 1.0
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN
3 2019-08-02 09:50:10.3 A 7 6 C Y 9.0 8.0
4 2019-08-02 09:50:10.3 C 9 8 C Y 9.0 8.0
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN
我们还可以使用GroupBy.transform
:
#matches
matches=df.Object.eq(df.Object2)
#Creating conditions
condition_z=df['Value']=='Z'
not_null=df['Value'].notnull()
#Creating DataFrame to fill
df12=( df[['1','2']].where(matches)
.groupby(df['Time'],sort=False)
.transform('first') )
#fill 0 on Value is Z and discarting NaN
df[['A1','A2']] =df12.where(not_null).mask(condition_z,0)
print(df)