apache-spark - How to get Great_Expectations to work with Spark Dataframes in Apache Spark ValueError: Unrecognized spark type: string
问题描述
I have a Apache Spark dataframe which as a 'string' type field. However, Great_Expectations doesn't recognize the field type. I have imported the modules that I think are necessary, but not sure why Great_Expectations doesn't recognize the field
import great_expectations as ge
import great_expectations.dataset.sparkdf_dataset
from great_expectations.dataset.sparkdf_dataset import SparkDFDataset
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, BooleanType
The following code reads in the csv as a dataframe
test = spark.read.csv('abfss://root@adlspretbiukadlsdev.dfs.core.windows.net/RAW/LANDING/customers.csv', inferSchema=True, header=True)
The following shows the schema:
test.printSchema()
Command executed in 2 sec 64 ms by carlton on 1:53:28 PM, 6/17/21
root
|-- first_name: string (nullable = true)
I think the following line of code creates Great_Expectation dataframe from the above Spark Dataframe
test2 = ge.dataset.SparkDFDataset(test)
I then code in the following expectation:
test2.expect_column_values_to_be_of_type(column='first_name', type_='string')
However, I get the following error:
ValueError: Unrecognized spark type: string
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/great_expectations/data_asset/util.py", line 80, in f
return self.mthd(obj, *args, **kwargs)
Not sure why Great_Expectations cannot recognize the Spark Type?
解决方案
You need to use names like, StringType
, LongType
, etc. - the same names as specified in the documentation. It should be like this:
test2.expect_column_values_to_be_of_type("first_name", "StringType")
see screenshot:
推荐阅读
- docker - 使用 docker 的多台机器的 Redis 集群
- twilio-flex - 如何解决“警告:无效的 DOM 属性 `fill-rule`。您的意思是 `fillRule`?” 在 Twilio Flex 上
- vim - vim 命令在以模式开头的文件中所有行的末尾添加分号
- processing - 平移后按定义的量旋转每个圆 - 处理
- powershell - 错误:制作:*** 没有规则来制作目标“makefile”。停止。使用 Powershell 和 Gnuwin32
- swift - Swift 可选语法
- r - 将指数修改的高斯拟合到 2D 数据
- php - PHP 致命错误:未捕获的错误:在 PHP 7.2.13 上调用未定义函数 idn_to_ascii()
- java - 将时间转换为 UTC 时间则相反
- c - 无法理解这是如何工作的