首页 > 解决方案 > 从 HDFS 和 Schema 问题读取 Parquet

问题描述

当我尝试从 HDFS 读取镶木地板文件时,我得到了所有混合大小写的模式。有什么办法可以将其转换为全部小写?

df=spark.read.parquet(hdfs_location)

df.printSchema();
root
|-- RecordType: string (nullable = true)
|-- InvestmtAccnt: string (nullable = true)
|-- InvestmentAccntId: string (nullable = true)
|-- FinanceSummaryID: string (nullable = true)
|-- BusinDate: string (nullable = true)

What i need is like below


root
|-- recordtype: string (nullable = true)
|-- investmtaccnt: string (nullable = true)
|-- investmentaccntid: string (nullable = true)
|-- financesummaryid: string (nullable = true)
|-- busindate: string (nullable = true)

标签: pysparkparquet

解决方案


首先阅读镶木地板文件

df=spark.read.parquet(hdfs_location)

然后使用.toDF函数创建所有数据框lower column names

df=df.toDF(*[c.lower() for c in df.columns])

推荐阅读