pyspark - 一些使用合理化胶水的数组数据类型结构不起作用

问题描述

我已经以 JSON 格式展平了文件，它工作正常。但是结构内的一些数组对象没有展平，在数据目录中它显示在 bigInt 中。如我错了请纠正我。

它只选择了 office.items 和 office.selected 所需的输出应该是选择的数据类型，并使其成为列格式，如 items.element.officeName、items.element.address、items.element.address.country 等

这是代码

glue_temp_storage = "s3://script/"

glue_relationalize_output_s3_path= "s3://script/"
dfc_root_table_name = "root"

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "stitchdb", table_name         = "listingsandreviews", transformation_ctx = "datasource0")

relationalized_json = Relationalize.apply(frame = datasource0, staging_path =     "glue_temp_storage",name = "dfc_root_table_name", transformation_ctx = "relationalized_json")

dfc = relationalized_json.select('dfc_root_table_name')

datasink2 = glueContext.write_dynamic_frame.from_options(frame =dfc , connection_type = "s3",         connection_options = {"path": "s3://script"}, format = "csv", transformation_ctx = "datasink2")
job.commit()

标签： pysparkaws-glue

pyspark - 一些使用合理化胶水的数组数据类型结构不起作用

问题描述

解决方案

推荐阅读

pyspark - 一些使用合​​理化胶水的数组数据类型结构不起作用

问题描述

解决方案

推荐阅读

pyspark - 一些使用合理化胶水的数组数据类型结构不起作用