首页 > 解决方案 > 在没有折扣的情况下查找等于零的发票 Pandas Dataframe

问题描述

在 Pandas 中,我有一个这样的数据框:

| Division |  Invoice |   Transactions | Amount |
|----------|----------|----------------|--------|
|   Europe | 10000000 | Product Charge |   1000 |
|   Europe | 10001000 | Product Charge |   1000 |
|   Europe | 10001000 |       Discount |   -500 |
|    Latam | 10002000 | Product Charge |      0 |
|    Latam | 10003000 | Product Charge |   1000 |
|    Latam | 10003000 |       Discount |  -1000 |
|   Europe | 10004000 | Product Charge |    500 |
|   Europe | 10004000 |       Discount |   -500 |
|   Europe | 10005000 | Product Charge |    500 |
|   Europe | 10005000 |       Discount |    495 |
|    Latam | 10006000 | Product Charge |      0 |
|    Latam | 10007000 | Product Charge |      0 |
|    Latam | 10007000 |  Loyalty bonus |    200 |

division当发票金额 = 0 和折扣 = 0 时,我需要构建一个新的 DF 来计算每个 DF,如下所示:

分配 发票 全部的 Q_发票
拉美 10002000 0 1
拉美 10006000 0 1

在 SQL 中我可以按如下方式计算它,但在 Pandas DF 中我无法复制它:

SELECT Division, Invoice, SUM (Amount) Total, COUNT (DISTINCT Invoice) Q_Invoice
FROM df
GROUP BY Division,  Invoice
HAVING SUM (CASE WHEN Transactions =  'Discount' THEN 1 ELSE 0 END) = '0'
       AND SUM (CASE WHEN Transactions = 'Product Charge' THEN 1 ELSE 0 END) >= '1'
       AND SUM(Amount) = 0

我尝试使用 pandassql 在 jupyter notebook 中复制上述结果,但它不起作用,请使用以下计算:

import pandasql as ps
import pandas as pd

 df2 = ps.sqldf ("""SELECT Division, Invoice, SUM (Amount) Total,COUNT (DISTINCT Invoice) Q_Invoice 
                   FROM df
                   GROUP BY Division,  Invoice
                   HAVING SUM (CASE WHEN Transactions =  'Discount' THEN 1 ELSE 0 END) = '0'
                   AND SUM (CASE WHEN Transactions = 'Product Charge' THEN 1 ELSE 0 END) >= '1'
                   AND SUM(Amount) = 0 """)

我不知道如何继续,我是熊猫新手

标签: sqlpandasdataframepandas-groupbyhaving

解决方案


我希望我已经正确理解了你的问题。您可以.pivot_table数据名,然后按产品费用的总和进行过滤:

x = df.pivot_table(
    index=["Division", "Invoice"],
    columns="Transactions",
    values="Amount",
    aggfunc=["sum", "count"],
    fill_value=0,
)
x = x[x[("sum", "Product Charge")].eq(0)].reset_index()
x.columns = x.columns.map("_".join)
x = x.rename(
    columns={
        "Division_": "Division",
        "Invoice_": "Invoice",
        "sum_Product Charge": "Total",
        "count_Product Charge": "Q_Invoice",
    }
)[["Division", "Invoice", "Total", "Q_Invoice"]]
print(x)

印刷:

  Division   Invoice  Total  Q_Invoice
0    Latam  10002000      0          1
1    Latam  10006000      0          1

推荐阅读