首页 > 解决方案 > Avoiding For Loops / Iterating over two tables

问题描述

I have two large pandas dataframes and am currently using nested for loops to perform the calculation listed below. This is highly inefficient and takes forever to run. I was wondering if anyone has an approach other than iterables / for loops to perform the following calc:

  1. I have two dataframes:

source

enter image description here

and destination:

enter image description here

I want to join the data from source and destination to get the following output:

enter image description here

The rules that I apply:

  1. The total of amt (38) in the source table remains the same in the output table
  2. There can be multiple columns present in source ( like Business) not present in destination.
  3. The logic starts at the beginning of the source table and proceeds to the end.
  4. Cells in green have a perfect match ( ie instrument and entity ) between source and destination tables. The Depot column is the new column in the output 5.Cells in Yellow have a lower Amt in the source, the Source Amt is maintained in the output
  5. Cells in Orange have no match in the destination, so the Amt is shown without a depot
  6. Cells in blue have a lower match in the destination so are split up

I am able to achieve the logic above using a nested for loop. However, this is not efficient and was wondering if there is a Pythonic approach to achieve this more efficiently.

标签: pythonpandas

解决方案


将熊猫导入为 pd

destination.drop(['Instrument', 'Amt'], axis = 1, inplace = True)
source = pd.merge(source, destination, on = 'Entity', how = 'left')


推荐阅读