首页 > 解决方案 > 带有 pandas_schema-module 的 dtype 字符串的 IsDtypeValidation-issue

问题描述

我正在尝试确认数据框中的列属于 dtype 'string'。为此,我使用包 pandas_schema。

下面的代码片段确认列何时不属于 dtype 'string'。但是,当 dtype 实际正确时,代码会失败并返回错误TypeError: Cannot interpret 'StringDtype' as a data type

我的代码有问题吗?

我知道我可以简单地检查 if series.convert_dtypes().dtypereturns string。但我更喜欢将此检查添加到pandas_schema.schema.Schema.

import numpy as np
import pandas as pd
from pandas_schema.validation import IsDtypeValidation

series = pd.Series(["a", "b", "c"])

# Works as expected:
#   Returns a validation warning as the series is of dtype 'object' and not 'string'.
print(f"dtype = {series.dtypes}")  # Returns: dtype = object
idv = IsDtypeValidation(dtype=np.dtype(np.str))
validation_warnings = idv.get_errors(series=series)
print(validation_warnings[0])  # Returns: The column  has a dtype of object which is not a subclass of the required type <U0

# But we know that the series only contains string-values. Thus convert_dtypes() in this snippet.
# Does not work as expected:
#   Returns an error and traceback with 'TypeError: Cannot interpret 'StringDtype' as a data type'.
#   Expected output should be no error or validation warning.
series = series.convert_dtypes()
print(f"dtype = {series.dtypes}")  # Returns: dtype = string
idv = IsDtypeValidation(dtype=np.dtype(np.str))
validation_warnings = idv.get_errors(series=series)  # Error occurs in this line.

标签: pythonpandasnumpydtype

解决方案


推荐阅读