首页 > 解决方案 > 根据两个数值列比较两个数据帧,并在彼此的数值阈值内找到匹配的行

问题描述

我有两个遵循这种结构的独立数据集:

import pandas as pd

data_one = {'ID':[281.1, 290.125, 450.123, 980.11,   1200,  130,  12,      500,   401.1,  1025.349], 
        'Code':  [201,   205.1,   900,     1200.54,  52,    44,   111.223, 192.1, 120.99, 11.1]} 

df_one = pd.DataFrame(data_one)
df_one
  ID        Code
0 281.100   201.000
1 290.125   205.100
2 450.123   900.000
3 980.110   1200.540
4 1200.000  52.000
5 130.000   44.000
6 12.000    111.223
7 500.000   192.100
8 401.100   120.990
9 1025.349  11.100
data_two = {'ID':[405.122, 12.125, 11356.12, 85.111,    1025.4069,   112.1111,  2112,      1200.6582,   980.1035,     441.123,  52842.584, 5648.12, 11], 
        'Code':  [1000,    33.1,   222,      101.541,   26.75,       25.2564,   11.08,     192.1,       1220.1258,    11.1,     7894,      1323,    123.123]}

df_two = pd.DataFrame(data_two) 
df_two
      ID          Code
0     405.1220    1000.0000
1     12.1250     33.1000
2     11356.1200  222.0000
3     85.1110     101.5410
4     1025.4069   26.7500
5     112.1111    25.2564
6     2112.0000   11.0800
7     1200.6582   192.1000
8     980.1035    1220.1258
9     441.1230    11.1000
10    52842.5840  7894.0000
11    5648.1200   1323.0000
12    11.0000     123.1230

我想在这两个数据帧之间找到 ID 和 Code 一致但分别在±0.5forID±30for的某个数值阈值内的行Code

因此,当特定行返回匹配项时:

df_one.ID is in this range: [df_two.ID-0.5,df_two.ID+0.5] AND df_one.Code is in the range [df_two.Code-30,df_two.code+30]

例如,df_one 中的第 3 行与 df_two 中的第 8 行匹配,因为 ID 和 Code 都满足条件。

            ID.1      Code.1       ID.2        Code.2
match_1     980.110   1200.540     980.1035    1220.1258

另一场比赛是:

            ID.1      Code.1       ID.2        Code.2
match_2     1025.349  11.100       1025.4069   26.7500

标签: python

解决方案


这可能是一种蛮力的方式,但总能找到您想要的解决方案。

import pandas as pd
import numpy as np

df3 = pd.DataFrame({}, columns=['ID.1', 'Code.1', 'ID.2', 'Code.2'])
id_1 = []
id_2 = []
code_1 = []
code_2 = []
for i, a in enumerate(list(df_one['ID'].values)):
  for i2, a2 in enumerate(list(df_two['ID'].values)):
    if ((((df_one.iloc[i, 0]) >= (df_two.iloc[i2, 0] - 0.5)) and  ((df_one.iloc[i, 0]) <= (df_two.iloc[i2, 0] + 0.5))) and 
        (((df_one.iloc[i, 1]) >= (df_two.iloc[i2, 1] - 30)) and ((df_one.iloc[i, 1]) <= (df_two.iloc[i2, 1] + 30)))):
      id_1.append(df_one.iloc[i, 0])
      id_2.append(df_two.iloc[i2, 0])
      code_1.append(df_one.iloc[i, 1])
      code_2.append(df_two.iloc[i2, 1])

df3['ID.1'] = id_1
df3['Code.1'] = code_1
df3['ID.2'] = id_2
df3['Code.2'] = code_2
df3

推荐阅读