首页 > 解决方案 > 我想在 python 数据框合并时添加一个条件

问题描述

import pandas as pd
import numpy as np

data_one_list = [[102, '2016-01-01 0:00', '2.5', '2.5'],
                 [102, '2016-01-01 1:00', '3.7', '9.3'],
                 [102, '2016-01-01 2:00', '5.8', '5.2'],
                 [102, '2019-10-31 7:00', '15.9', '14.5'],
                 [102, '2019-10-31 8:00', '17.6', '17.5'],
                 [102, '2019-10-31 9:00', '12.4', '13.5']]

merge_one_df = pd.DataFrame(data = data_one_list, columns=['stn_no', 'datetime', 'no_a', 'no_b'])
print(merge_one_df)

data_two_list = [[102, '2018-05-01', np.nan, '37.9740', '124.7124'],
             [102, '2000-11-01', '2018-05-01', '37.9661', '124.6305']]

merge_two_df = pd.DataFrame(data = data_two_list, columns=['stn_no', 'start_date', 'end_date', 'latitude', 'longitude'])
print(merge_two_df)

我想在'str_no,datetime'上合并'merge_one_df','merge_two_df'

示例结果:

在此处输入图像描述

标签: pythondataframemerge

解决方案


你可以试试:

result = pd.merge(merge_one_df, merge_two_df, on=['stn_no'])
result['end_date'] = result['end_date'].fillna('2099-01-01')
mask = (result['datetime'] > result['start_date']) & (result['datetime'] <= result['end_date'])
result = result[mask]
result.reset_index(inplace=True, drop=True)
result['end_date'] = result['end_date'].replace('2099-01-01', np.nan)

输出:

   stn_no         datetime  no_a  ...    end_date latitude longitude
0     102  2016-01-01 0:00   2.5  ...  2018-05-01  37.9661  124.6305
1     102  2016-01-01 1:00   3.7  ...  2018-05-01  37.9661  124.6305
2     102  2016-01-01 2:00   5.8  ...  2018-05-01  37.9661  124.6305
3     102  2019-10-31 7:00  15.9  ...         NaN  37.9740  124.7124
4     102  2019-10-31 8:00  17.6  ...         NaN  37.9740  124.7124
5     102  2019-10-31 9:00  12.4  ...         NaN  37.9740  124.7124

推荐阅读