首页 > 解决方案 > Dask 使用正确的元数据应用抛出错误

问题描述

我开始使用 dask,但遇到了一些对我来说毫无意义的错误。

我正在尝试朗姆酒以下代码:

import dask.dataframe as dd

testpd = pd.DataFrame(
    {
        "SKU_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
        "STR_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 64, 6: 64},
        "DATE": {
            0: Timestamp("2018-01-01 00:00:00"),
            1: Timestamp("2018-01-02 00:00:00"),
            2: Timestamp("2018-01-03 00:00:00"),
            3: Timestamp("2018-01-04 00:00:00"),
            4: Timestamp("2018-01-05 00:00:00"),
            5: Timestamp("2020-02-22 00:00:00"),
            6: Timestamp("2020-02-23 00:00:00"),
        },
        "ORD_UNITS": {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0},
    }
)

testdd = dd.from_pandas(testpd, npartitions=2,)

def func(x):
    return pd.Series(x["DATE"] == x["DATE"].min(), name="result")

与熊猫一起,完美地工作:testpd.groupby(["SKU_ID", "STR_ID"]).apply(func)

但随着 dask 我得到:

testdd.groupby(["SKU_ID", "STR_ID"]).apply(
    func, meta=pd.Series([], dtype=bool, name="result")
).compute()

AttributeError: 'Series' object has no attribute 'columns'

标签: pythonpandasdask

解决方案


推荐阅读