首页 > 解决方案 > 如何根据条件在数据帧上应用乘数?

问题描述

鉴于第一个数据框df_1

df_1 = pd.DataFrame({'id':[1,2,1,3,1],
                     'symbol':['A','B','C','A','A'],
                     'date':['2021-02-12','2021-02-09','2021-02-14','2021-02-02','2021-02-05'],
                     'value':[1,1,1,1,1]})
   id symbol        date  value
0   1      A  2021-02-12      1
1   2      B  2021-02-09      1
2   1      C  2021-02-14      1
3   3      A  2021-02-02      1
4   1      A  2021-02-05      1

并给出第二个df_2

df_2 = pd.DataFrame({'id_symbol':['1_A', '1_A'],
                     'init_date':['2021-02-01','2021-02-01'],
                     'end_date':['2021-02-05', '2021-02-12'],
                     'multiplier':[5,2]})

我需要将其应用于and 的 concat等于 the 的df_2.multiplierdf_1,并且如果 the位于and内。df_1.valueid_symboldf_2.id_symboldf_1.datedf_2.init_datedf_2.end_date

我的结果应该是这样的,在代码之后:

   id symbol        date   value
0   1      A  2021-02-12       2
1   2      B  2021-02-09       1
2   1      C  2021-02-14       1
3   3      A  2021-02-02       1
4   1      A  2021-02-05      10

5 = 1 * 5 // 10 = 1 * 5 * 2

我的两个数据框都比这大得多。

标签: pythonpandasdataframe

解决方案


I was able to get to your desired outcome with the following process. I will comment on the code to explain each step, but it is fairly simple:

# Our imports
import pandas as pd
import numpy as np

# Create the id_symbol column in df_1 and perform a left merge
df_1['id_symbol']=df_1['id'].astype(str)+"_"+df_1['symbol'].astype(str)
df_1 = pd.merge(df_1,df_2,how='left',on='id_symbol')

# Make sure you convert to datetime, and handle the NaT succesfully
for i in ['init_date','end_date']:
    df_1[i] = pd.to_datetime(df_1[i],errors='coerce')

# Use np.where to check whether the conditions you want are satisfied
df_1['value'] = np.where((df_1['init_date'] <= df_1['date']) & (df_1['date']  <= df_1['end_date']),
                         df_1['value']*df_1['multiplier'],1) 

# Drop the unwanted columns
df_1.drop(['id_symbol','init_date','end_date','multiplier'],axis=1,inplace=True)

Will get back your desired outcome:

   id symbol        date  value
0   1      A  2021-02-12    1.0
1   2      B  2021-02-09    1.0
2   1      C  2021-02-14    1.0
3   3      A  2021-02-02    1.0
4   1      A  2021-02-05    5.0

A bit of an ad-hoc, but seems to be working.


推荐阅读