首页 > 解决方案 > 如何迭代两个数据帧以比较数据并进行处理?

问题描述

我有两个不同的数据框:A、B。Event 列具有我用来比较两个数据框的相似数据。我想给 Dataframe A 一个新列 dfA.newContext#。

为此,我需要使用事件列。我想遍历 Dataframe A 以找到 Event 的匹配项并将 dfB.context# 分配给 dfA.newContext#

我认为循环是最好的方法,因为我需要检查一些条件。

这可能要求有点多,但我真的卡住了..我想做这样的事情:

offset = 0
Iterate through dfA:
    extract event
    extract context#
        Iterate through dfB:
            if dfB.event == dfA.event:
                dfA.newContext# = dfB.context#
                offset = dfA.new_context# - dfA.context#
                if dfB.event == "Special":
                    dfA.newContext# = dfA.context# - offset
          

数据框 A

+-------------+---------+------+
|dfA.context# |dfA.event| Name |
+-------------+---------+------+
| 0           | Special | Bob  |
| 2           | Special | Joan |
| 4           |    Bird | Susie|
| 5           | Special | Alice|
| 6           | Special | Tom  |
| 7           | Special | Luis |
| 8           |  Parrot | Jill |
| 9           | Special | Reed |
| 10          | Special | Lucas|
| 11          |   Snake | Kat  |
| 12          | Special | Bill |
| 13          | Special | Leo  |
| 14          | Special | Peter|
| 15          | Special | Mark |
| 16          | Special | Joe  |
| 17          | Special | Lora |
| 18          | Special | Care |
| 19          |Elephant | David|
| 20          | Special | Ann  |
| 21          | Special | Larry|
| 22          |   Skunk | Tony |
+-------------+---------+------+

数据框 B

+-------------+---------+
|dfB.context# |dfB.event|
+-------------+---------+
| 0           | Special |
| 0           | Special |
| 0           | Special |
| 1           | Special |
| 1           | Special |
| 1           | Special |
| 1           | Special |
| 2           |    Bird |
| 2           |    Bird |
| 3           | Special |
| 6           |  Parrot |
| 6           |  Parrot |
| 6           |  Parrot |
| 6           |  Parrot |
| 7           | Special |
| 7           | Special |
| 9           |   Snake |
| 9           |   Snake |
| 9           |   Snake |
| 10          | Special |
| 17          |Elephant |
| 17          |Elephant |
| 17          |Elephant |
| 18          | Special |
| 18          | Special |
| 20          |  Skunk  |
| 20          |  Skunk  |
| 21          | Special |
| 26          | Antelope|
+-------------+---------+

所需的DF

+-------------+---------+------+-------------+
|dfA.context# |dfA.event| Name |dfA.newContext#|
+-------------+---------+------+-------------+
| 0           | Special | Bob  |           0 |
| 2           | Special | Joan |           1 |
| 4           |    Bird | Susie|           2 |
| 5           | Special | Alice|           3 |
| 6           | Special | Tom  |             |
| 7           | Special | Luis |             |
| 8           |  Parrot | Jill |           6 |
| 9           | Special | Reed |           7 |
| 10          | Special | Lucas|             |
| 11          |   Snake | Kat  |           9 |
| 12          | Special | Bill |          10 | 
| 13          | Special | Leo  |             |
| 14          | Special | Peter|             |
| 15          | Special | Mark |             |
| 16          | Special | Joe  |             |
| 17          | Special | Lora |             |
| 18          | Special | Care |             |
| 19          |Elephant | David|          17 |
| 20          | Special | Ann  |          18 |
| 21          | Special | Larry|             |
| 22          |   Skunk | Tony |          20 |
+-------------+---------+------+-------------+

如何一次遍历两个数据框并访问信息?

标签: pythonpandasdataframeloops

解决方案


95% 的时间你可以使用 pandas 矢量化方法并消除循环的需要。在这种情况下,您可以使用pd.merge一个简单、干净和高效的长循环替代方案。

编辑:(答案#1):实际上,您可以进行更高级的合并,left_on=dfA.index, right_on='context'并在合并后与其他清理操作在一行中执行此操作,但请参阅下面的更完整答案,它采用类似的方法:

df = (pd.merge(dfA, dfB['context'], how='left', left_on=dfA.index, right_on='context')
        .drop_duplicates()
        .dropna(subset=['Name'])
        .drop('context', axis=1)
        .rename({'context_x' : 'context', 'context_y' : 'newContext'}, axis=1).fillna(''))

答案#2: 您可以在操作两个数据帧以准备合并后将两个数据帧合并在一起:

  1. dfA- 使contextdfA等于index,但在更改之前,将其保存为系列s以供以后使用
  2. dfB- 删除重复项,重置索引,并将索引名称更改为newContext以准备合并。
  3. 合并eventcontext替换newContext值为contextnull 的值。
  4. context回它的原始数据df['context'] = s

s = dfA['context']
dfA['context'] = dfA.index.astype(str)
dfB = dfB.drop_duplicates().reset_index().rename({'index' :'newContext'}, axis=1).astype(str)
df = pd.merge(dfA, dfB, how='left', on=['event', 'context'])
df['newContext'] = df['newContext'].where(df['newContext'].isnull(), df['context']).fillna('')
df['context'] = s
df
Out[9]: 
    context     event   Name newContext
0         0   Special    Bob          0
1         2   Special   Joan          1
2         4      Bird  Susie          2
3         5   Special  Alice          3
4         6   Special    Tom           
5         7   Special   Luis           
6         8    Parrot   Jill          6
7         9   Special   Reed          7
8        10   Special  Lucas           
9        11     Snake    Kat          9
10       12   Special   Bill         10
11       13   Special    Leo           
12       14   Special  Peter           
13       15   Special   Mark           
14       16   Special    Joe           
15       17   Special   Lora           
16       18   Special   Care           
17       19  Elephant  David         17
18       20   Special    Ann         18
19       21   Special  Larry           
20       22     Skunk   Tony         20

推荐阅读