scala - 在 scala 中将 Map Datatype 的新列添加到 Spark Dataframe
问题描述
我能够创建一个新的 Dataframe,其中一列具有 Map 数据类型。
val inputDF2 = Seq(
(1, "Visa", 1, Map[String, Int]()),
(2, "MC", 2, Map[String, Int]())).toDF("id", "card_type", "number_of_cards", "card_type_details")
scala> inputDF2.show(false)
+---+---------+---------------+-----------------+
|id |card_type|number_of_cards|card_type_details|
+---+---------+---------------+-----------------+
|1 |Visa |1 |[] |
|2 |MC |2 |[] |
+---+---------+---------------+-----------------+
现在我想创建一个与 card_type_details 相同类型的新列。我正在尝试使用 spark withColumn 方法来添加这个新列。
inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").show(false)
+---------+---------+---------------+---------------------+-----+
|person_id|card_type|number_of_cards|card_type_details |tmp |
+---------+---------+---------------+---------------------+-----+
|1 |Visa |1 |[] |null |
|2 |MC |2 |[] |null |
+---------+---------+---------------+---------------------+-----+
当我检查两列的架构时,它是相同的,但值会有所不同。
scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>").printSchema
root
|-- id: integer (nullable = false)
|-- card_type: string (nullable = true)
|-- number_of_cards: integer (nullable = false)
|-- card_type_details: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = false)
|-- tmp: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = true)
我不确定添加新列时是否正确。当我在 tmp 列上应用 .isEmpty 方法时,问题来了。我收到空指针异常。
scala> def checkValue = udf((card_type_details: Map[String, Int]) => {
| var output_map = Map[String, Int]()
| if (card_type_details.isEmpty) { output_map += 0.toString -> 1 }
| else {output_map = card_type_details }
| output_map
| })
checkValue: org.apache.spark.sql.expressions.UserDefinedFunction
scala> inputDF2.withColumn("value", checkValue(col("card_type_details"))).show(false)
+---+---------+---------------+-----------------+--------+
|id |card_type|number_of_cards|card_type_details|value |
+---+---------+---------------+-----------------+--------+
|1 |Visa |1 |[] |[0 -> 1]|
|2 |MC |2 |[] |[0 -> 1]|
+---+---------+---------------+-----------------+--------+
scala> inputDF2.withColumn("tmp", lit(null) cast "map<String, Int>")
.withColumn("value", checkValue(col("tmp"))).show(false)
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$checkValue$1: (map<string,int>) => map<string,int>)
Caused by: java.lang.NullPointerException
at $anonfun$checkValue$1.apply(<console>:28)
at $anonfun$checkValue$1.apply(<console>:26)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:108)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:107)
at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1063)
如何添加应该与 card_type_details 列具有相同值的新列。
解决方案
要添加与card_type_detailstmp
具有相同值的列,您只需执行以下操作:
inputDF2.withColumn("tmp", col("cart_type_details"))
如果您的目标是添加带有空地图的列并避免使用NullPointerException
,则解决方案是:
inputDF2.withColumn("tmp", typedLit(Map.empty[Int, String]))
推荐阅读
- flutter - 第 38 行第 1 列出错:重复映射键。在颤动的 pubspec.yaml 中
- tomcat - tomcat VM中的权限问题
- excel - 面板数据 - 从年份和周数创建日期变量作为字符串
- azure - Azure 应用程序备份配置错误:无法保存备份配置
- vue.js - Vuejs : suggestions component in table row
- angular8 - 我在 Angular 8 中集成了一个 adminLTE 模板。当我第一次启动服务器时,程序在出现此错误后正常工作
- kubernetes - Keep running into "exceeded its progress dead line" despite changing progressDeadlineSeconds
- c++ - Convert LPVOID bitmap pointer to QPixmap
- c# - MVC Core + Azure Active Directory Get Groups from GraphAPI
- reactjs - Enzyme/Jest onSubmit 不调用提交函数