java - 像 java.lang.String 这样的错误不是双模式的有效外部类型在下面的代码中
问题描述
我的代码如下所示:
object DataTypeValidation extends Logging {
def main(args: Array[String]) {
val spark = SparkSession.builder()
.appName("SparkProjectforDataTypeValidation")
.master("local")
.getOrCreate();
spark.sparkContext.setLogLevel("ERROR")
try {
breakable {
val format = new SimpleDateFormat("d-M-y hh:mm:ss.SSSSS")
println("*********Data Type Validation Started*************** " + format.format(Calendar.getInstance().getTime()))
val data = Seq(Row(873131558, "ABC22"), Row(29000000, 99.00), Row(27000000, 2.34))
val schema = StructType(Array(
StructField("oldcl", IntegerType, nullable = true),
StructField("newcl", DoubleType, nullable = true))
)
val ONE = 1
var erroredRecordRow = new scala.collection.mutable.ListBuffer[Row]()
val newSchema = schema.fields.map({
case StructField(name, _: IntegerType, nullorNotnull, _) => StructField(name, StringType, nullorNotnull)
case StructField(name, _: DoubleType, nullorNotnull, _) => StructField(name, StringType, nullorNotnull)
case fields => fields
}).dropRight(ONE)
val newStructType = StructType { newSchema }
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.show()
print(df.schema)
}
} catch {
case exception: Exception =>
println("exception caught in Data Type Mismatch In Schema Validation: " + exception.toString())
exception.printStackTrace();
}
spark.stop()
}
}
exception caught in Data Type Mismatch In Schema Validation: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of double
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, oldcl), IntegerType) AS oldcl#0
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, newcl), DoubleType) AS newcl#1
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:292)
解决方案
@AnkitTomar,此错误是由于字符串值ABC22
映射到DoubleType
.
请更新以下行
val data = Seq(Row(873131558, "ABC22"), Row(29000000, 99.00), Row(27000000, 2.34))
val schema = StructType(Array(
StructField("oldcl", IntegerType, nullable = true),
StructField("newcl", DoubleType, nullable = true))
)
和
val data = Seq(Row(873131558, "ABC22"), Row(29000000, "99.00"), Row(27000000, "2.34"))
val schema = StructType(Array(
StructField("oldcl", IntegerType, nullable = true),
StructField("newcl", StringType, nullable = true))
)
这样您就可以检索预期的结果,
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.show()
/*
+---------+-----+
| oldcl|newcl|
+---------+-----+
|873131558|ABC22|
| 29000000|99.00|
| 27000000| 2.34|
+---------+-----+
*/
注意:我在您的代码中找不到 newSchema 的用法,如果您遵循任何其他方法,请发表评论
val ONE = 1
var erroredRecordRow = new scala.collection.mutable.ListBuffer[Row]()
val newSchema = schema.fields.map({
case StructField(name, _: IntegerType, nullorNotnull, _) => StructField(name, StringType, nullorNotnull)
case StructField(name, _: DoubleType, nullorNotnull, _) => StructField(name, StringType, nullorNotnull)
case fields => fields
}).dropRight(ONE)
val newStructType = StructType { newSchema }
推荐阅读
- java - 将菜单项的位置更改为左侧
- python - z3py:如何询问我的 adt 实例有哪个构造函数?
- mysql - 使用laravel中的事件更改数据库时如何向管理员发送通知
- sharepoint - Sharepoint Online REST API System.UnauthorizedAccessException 尝试获取 SiteGroup/Users 时
- sql - 扁平大查询行
- pattern-matching - 嵌套模式匹配不是详尽的警告
- javascript - 使用python OpenCV实时接收webRTC视频流
- android - 为什么 SignalStrength 中的常量被隐藏?
- javascript - 获取数组 JavaScript 中键对值的最大数量
- javascript - webpack - 如何使用单个入口点创建拆分包