python - How to determine what are the columns I need since ApplyMapping is'nt case sensitive?
问题描述
I'm updating a Pyspark script with a new Database model and I've encountered some problems calling/updating columns since PySpark apparently brings all columns in uppercase but when I use ApplyMapping it is not case sensitive BUT when I join(By left) with another table it is case sensitive and I end up with multiple columns with the same name but one of them in uppercase and the other one in lowercase and I want to use SelectFields function.
I've tried the exact same name (Case sensitive) of the columns but it always brings the same.
I've tried printing the schema but the only difference is only the case.
testDF = testDF.join(test2DF, "COLUMN",how='left')
test3DF=test3DF.withColumn("column", test3DF['ATTRB'].substr(4,4))
fullDF= testDF.join(test3DF, (testDF['ID'] == test3DF['ID']) )
.....
applymappingTest = ApplyMapping.apply(frame = fullDF, mappings = [
('ID', 'string', 'identifier', 'string')
('column', 'string', 'myColumn', 'string')
('COLUMN', 'string', 'someother', 'string')
], transformation_ctx = "applymappingTest")
......
selectfieldsTest= SelectFields.apply(frame = applymappingTest, paths = [
"identifier",
"myColumn",
], transformation_ctx = "selectfieldsTest")
Expected result:
myColumn is the column with the name in lowercase.
Actual result:
myColumn is the column with the name in uppercase.
解决方案
您可以在 applymapping 中设置 caseSensitive 选项。
def applyMapping( mappings : Seq[Product4[String, String, String, String]], caseSensitive : Boolean = true, transformationContext : String = "", callSite : CallSite = CallSite("Not provided", ""), stageThreshold : Long = 0, totalThreshold : Long = 0 ) : DynamicFrame
顺便说一句,如果我没记错的话,applyMapping 默认是区分大小写的。但 spark SQL 默认不区分大小写。要在 spark SQL 中设置区分大小写,您可以使用:
spark_session.sql('set spark.sql.caseSensitive=true')
推荐阅读
- flutter - 使用 Realtime Firebase 创建 StreamBuilder 以在 Sensor 值发生变化时进行跟踪
- oracle - 如何解决 Oracle 的这个问题?
- php - 使用 PHP 的 IBM 云存储
- git - 将现有 Git Repo 转移到 GitHub 中,并将大文件添加到 LFS
- azure - 如何将 kubectl 任务的输出传递到 Azure Devops 中的下一个任务
- python - 无法按标签对熊猫列进行排序,似乎没有任何方法对我有用?
- python - 数据透视表的问题 - 信息被挤在一行中
- python - Django django-filter django-tables2 限制查询结果
- vue.js - 使用 Vue 对数据进行检查后,无法将复选框设置为未选中状态
- node.js - 是否可以在 NPM 上将 Nuxt 应用程序作为包发布以在另一个 Nuxt 应用程序中重用?如果是,我将如何做到这一点?