python - 如何为 Pandas 数据框创建 Great Expectations 检查点?
问题描述
我的数据源配置如下所示:
datasource_config = {
"name": "example_datasource",
"class_name": "Datasource",
"module_name": "great_expectations.datasource",
"execution_engine": {
"module_name": "great_expectations.execution_engine",
"class_name": "PandasExecutionEngine",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"module_name": "great_expectations.datasource.data_connector",
"batch_identifiers": ["default_identifier_name"],
},
},
}
context.add_datasource(**datasource_config)
通过以下命令成功创建了我的 Pandas 数据框和 batch_requests:
...
df = read_csv_pandas(file_path="../done/my_file.txt",
sep="|",
header=0,
quoting=csv.QUOTE_ALL)
batch_request = RuntimeBatchRequest(
datasource_name="example_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="MyDataAsset",
runtime_parameters={"batch_data": df},
batch_identifiers={"default_identifier_name": "default_identifier"}
)
我的期望套件:
expectation_suite_name = "My_validations"
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
然后我正在创建验证器。
validator = context.get_validator(
batch_request=batch_request, expectation_suite_name=expectation_suite_name
)
validator.head(2)
最后一个命令成功打印了我的数据框的 2 行。
然后我将期望添加到我的套件中。
validator.expect_table_columns_to_match_ordered_list(['last_name', 'first_name', 'sex'])
validator.expect_column_values_to_be_in_set("sex", ["male", "female", "other", "unknown"])
validator.save_expectation_suite(discard_failed_expectations=False)
然后我正在生成数据文档:
suite_identifier = ExpectationSuiteIdentifier(expectation_suite_name=expectation_suite_name)
context.build_data_docs(resource_identifiers=[suite_identifier])
context.open_data_docs(resource_identifier=suite_identifier)
我的检查点看起来像:
name: my_checkpoint_2
config_version: 1
class_name: SimpleCheckpoint
validations:
- batch_request:
datasource_name: example_datasource
data_connector_name: default_runtime_data_connector_name
data_asset_name: MyDataAsset
runtime_parameters:
batch_data: {df}
batch_identifiers:
default_identifier_name: default_identifier
expectation_suite_name: My_validations
但是这个命令
context.run_checkpoint(checkpoint_name="my_checkpoint_2")
产生错误:
ValueError: RuntimeDataBatchSpec must provide a Pandas DataFrame or PandasBatchData object.
解决方案
推荐阅读
- azure - Azure 计算机视觉与 Azure Blob 存储
- flutter - 视频带宽溢出
- javascript - Echarts Datazoom 在几个图表中
- c# - 如何在 valuechanged 事件之前获取 NumericUpDown 的文本?
- sharepoint - 如何在 Sharepoint 中自动设置文档类型
- c# - 在 C# 中将单词变成标记值
- unit-testing - 没有在北极狐上注册的仪器
- javascript - firebase 在本机反应中不起作用-expo install firebase
- python-3.x - 如何使用传递给它的字典参数调用functional_call?
- rabbitmq - Terraform 和 RabbitMQ - 启用插件