首页 > 解决方案 > 如何在 pytest 中进行自定义比较?

问题描述

例如,我想断言两个 Pyspark DataFrame 具有相同的数据,但只是使用==检查它们是否是同一个对象。理想情况下,我还想指定订单是否重要。

我尝试编写一个函数,该函数会引发 anAssertionError但会为 pytest 输出增加很多噪音,因为它显示了该函数的回溯。

我的另一个想法是模拟__eq__DataFrames 的方法,但我不相信这是正确的方法。

编辑:

我考虑只使用返回 true 或 false 的函数而不是运算符,但这似乎不适用于pytest_assertrepr_compare. 我对该钩子的工作原理还不够熟悉,因此可能有一种方法可以将它与函数而不是运算符一起使用。

标签: pythonpytest

解决方案


My current solution is to use a patch to override the DataFrame's __eq__ method. Here's an example with Pandas as it's faster to test with, the idea should apply to any object.

import pandas as pd
# use this import for python3
# from unittest.mock import patch
from mock import patch


def custom_df_compare(self, other):
    # Put logic for comparing df's here
    # Returning True for demonstration
    return True


@patch("pandas.DataFrame.__eq__", custom_df_compare)
def test_df_equal():
    df1 = pd.DataFrame(
        {"id": [1, 2, 3], "name": ["a", "b", "c"]}, columns=["id", "name"]
    )
    df2 = pd.DataFrame(
        {"id": [2, 3, 4], "name": ["b", "c", "d"]}, columns=["id", "name"]
    )

    assert df1 == df2

Haven't tried it yet but am planning on adding it as a fixture and using autouse to use it for all tests automatically.

In order to elegantly handle the "order matters" indicator, I'm playing with an approach similar to pytest.approx which returns a new class with it's own __eq__ for example:

class SortedDF(object):
    "Indicates that the order of data matters when comparing to another df"

    def __init__(self, df):
        self.df = df

    def __eq__(self, other):
        # Put logic for comparing df's including order of data here
        # Returning True for demonstration purposes
        return True


def test_sorted_df():
    df1 = pd.DataFrame(
        {"id": [1, 2, 3], "name": ["a", "b", "c"]}, columns=["id", "name"]
    )
    df2 = pd.DataFrame(
        {"id": [2, 3, 4], "name": ["b", "c", "d"]}, columns=["id", "name"]
    )

    # Passes because SortedDF.__eq__ is used
    assert SortedDF(df1) == df2
    # Fails because df2's __eq__ method is used
    assert df2 == SortedDF(df2)

The minor issue I haven't been able to resolve is the failure of the second assert, assert df2 == SortedDF(df2). This order works fine with pytest.approx but doesn't here. I've tried reading up on the == operator but haven't been able to figure out how to fix the second case.


推荐阅读