首页 > 解决方案 > 如果条件匹配,则从第二个数据帧中获取数据并将其附加到第一个数据帧

问题描述

我有两个名为emp_crmverient的数据框。如果满足以下条件,则将 emp_crm 数据帧值附加verient数据帧。

问题陈述: 如果匹配AND 等于 AND小于AND 大于then,
我想从中获取记录并附加到 ,以便匹配emp_crm附加到 Verient [所有列]。verientemp_crm[ACD]Verient[PbxId]Date of AudioStartDate of OPEN_DTTime of OPEN_DTTime of AudioStartTime of AudioEnd
[condition: AudioStartTime < Ticket OpenTime < Audio EndTime]
emp_crm[ACD]Verient[PbxId]emp_crm's [all columns]

需要注意的地方:

任何记录都不应从verient中删除,只为匹配的记录追加,并将其他记录留空。

我的方法——

  1. 如果emp_crm[ACD] == Verient[PbxId]AND verient[AudioStartDate] == emp_crm[TicketOpenDate]ANDStartTime_in_Seconds < OpenTime_in_Seconds < EndTime_in_Seconds然后
    1.1。对于匹配emp_crm[ACD] == Verient[PbxId]
    1.1.1。将 emp_crm 的 TICKET_NO、Tier_1 附加到 Verient(所有列)
    1.1.2。删除不必要的列以使 verient_crm 保持简单。
DataFrame: emp_crm

ACD     NID     OPEN_DT             OPEN_ID TICKET_NO       TIER_1  OpenDate    OpenTime    OpenTime_in_Seconds 
6557    n199    2021-05-19 20:13:00 n199    INC000047814    Voice   2021-05-19  20:13:00    72780
7768    n141    2021-04-30 19:39:00 n141    INC000047860    Data    2021-04-30  19:39:00    70740
7768    n141    2021-04-30 15:56:30 n141    INC000047853    Data    2021-04-30  15:56:30    57390
6570    n116    2021-05-19 16:00:27 n116    INC000047848    Ring    2021-05-24  16:10:00    58200
6570    n116    2021-05-19 15:55:02 n116    INC000047784    Voice   2021-05-19  15:55:02    57302


DataFrame: verient

AudioEnd            AudioStart              PbxId   AudioStartDate  StartTime AudioEndDate  EndTime     StartTime_in_Seconds    EndTime_in_Seconds 
2021-04-30 16:07:24 2021-04-30 15:35:27     6576    2021-04-30      15:35:27  2021-04-30    16:07:24    56127                   58044
2021-04-30 15:51:12 2021-04-30 15:46:30     7768    2021-04-30      15:46:30  2021-04-30    15:51:12    56790                   57072
2021-04-30 16:16:06 2021-04-30 15:48:22     7768    2021-04-30      15:48:22  2021-04-30    16:16:06    56902                   58566
2021-05-19 16:00:47 2021-05-19 15:50:02     6570    2021-05-19      15:50:02  2021-05-19    16:00:47    57002                   57647
2021-05-19 16:02:20 2021-05-19 15:53:26     6570    2021-05-19      15:53:26  2021-05-19    16:02:20    57206                   57740


Final DataFrame: verient_crm

AudioEnd                AudioStart              PbxId   ACD     NID     OPEN_DT                 OPEN_ID     TICKET_NO       TIER_1
2021-04-30 16:07:24     2021-04-30 15:35:27     6576    
2021-04-30 15:51:12     2021-04-30 15:46:30     7768    
2021-04-30 16:16:06     2021-04-30 15:48:22     7768    7768    n141    2021-04-30 15:56:30     n141        INC000047853    Data
2021-05-19 16:00:47     2021-05-19 15:50:02     6570    6570    n116    2021-05-19 16:00:27     n116        INC000047848    Ring
2021-05-19 16:02:20     2021-05-19 15:53:26     6570    6570    n116    2021-05-19 15:55:02     n116        INC000047784    Voice   


Final Result after dropping unnecessary columns:
------------------------------------------------
verient_crm(AudioEnd,AudioStart,PbxId,ACD, NID,OPEN_DT, OPEN_ID, TICKET_NO, TIER_1)


我的代码和方法:


import pandas as pd
import numpy as np

matchedDict = {}
for record in emp_crm.iterrows():  
    for vr in verient.iterrows():   
        if(record == vr):   ## verient.pbxLoginID == emp_CRM.ACD
            if(vr.AudioStart_in_TimeStamp <= record.OPEN_DT_in_TimeStamp and record.OPEN_DT_in_TimeStamp <= vr.AudioEnd_in_TimeStamp):
                matchedDict[vr.index] = record  


# Append emp_crm(ACD, NID,OPEN_DT, OPEN_ID, TICKET_NO, TIER_1) with Verient for every matched ACD == PbxID from emp_crm to Verient. 

for item in matchedDict:
    record = item.value
    verient.iloc[item.key,col_index_for_ACD] = record['ACD']  ## Don't know how to fetch col_index_for_tier1
    verient.iloc[item.key,col_index_for_NID] = record['NID']
    verient.iloc[item.key,col_index_for_OPEN_DT] = record['OPEN_DT']
    verient.iloc[item.key,col_index_for_OPEN_ID] = record['OPEN_ID']
    verient.iloc[item.key,col_index_for_TICKET_NO] = record['TICKET_NO']
    verient.iloc[item.key,col_index_for_TIER_1] = record['TIER_1']

我收到以下错误,无法解决。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-bf0a6bb075ee> in <module>
      4 for record in emp_crm.iterrows():
      5     for vr in verient.iterrows():
----> 6         if(record['ACD'] == vr['pbxLoginID']):   ## verient.pbxLoginID == emp_CRM.ACD or not
      7             if(vr.AudioStart_in_TimeStamp <= record.OPEN_DT_in_TimeStamp and record.OPEN_DT_in_TimeStamp <= vr.AudioEnd_in_TimeStamp):
      8                 matchedDict[vr.index] = record

TypeError: tuple indices must be integers or slices, not str

需要注意的地方:

  1. 任何记录都不应从 verient 中删除,只为 macthing 记录追加,并将其他记录留空。

标签: pythonpython-3.xpandas

解决方案


您可以合并数据帧,然后按如下方式过滤结果数据帧,而不是迭代数据帧(这是一个非常昂贵的操作) -

# Left merge
verient = pd.merge(verient, emp_crm, left_on = ['PbxId'], right_on = ['ACD'], how='left')

# Filter
verient= verient.loc[(verient.AudioStart_in_TimeStamp < verient.OPEN_DT_in_TimeStamp) & (verient.OPEN_DT_in_TimeStamp < verient.AudioEnd_in_TimeStamp)]

然后,您可以根据需要删除不必要的列。

编辑 -要保留数据框中的原始行verient,您可以将NULL值分配给emp_crm条件不匹配的列。而不是filter,这样做 -

verient.loc[~((verient.AudioStart_in_TimeStamp < verient.OPEN_DT_in_TimeStamp) & (verient.OPEN_DT_in_TimeStamp < verient.AudioEnd_in_TimeStamp)), ['ACD', 'NID', 'OPEN_DT', 'OPEN_ID', 'TICKET_NO', 'TIER_1']] = np.nan

推荐阅读