scala - Spark,在数组数组中添加新字段
问题描述
我是 Spark 的新手,具有以下df
架构(其中一部分)。
root
|-- value: binary (nullable = false)
|-- event: struct (nullable = false)
| |-- eventtime: struct (nullable = true)
| | |-- seconds: long (nullable = true)
| | |-- ms: float (nullable = true)
| |-- fault: struct (nullable = true)
| | |-- collections: struct (nullable = true)
| | | |-- snapshots: array (nullable = false) --> ** FIRST LEVEL ARRAY (or array of arrays) **
| | | | |-- element: struct (containsNull = false)
| | | | | |-- ringbuffer: struct (nullable = true)
| | | | | | |-- columns: array (nullable = false) --> ** SECOND LEVEL ARRAY **
| | | | | | | |-- element: struct (containsNull = false)
| | | | | | | | |-- doubles: struct (nullable = true)
| | | | | | | | | |-- values: array (nullable = false)
| | | | | | | | | | |-- element: float (containsNull = false)
....................................
..........................
我可以fault
使用以下代码添加一个新字段,并且新字段comp_id
与collections
.
df.withColumn("event", col("event").withField("fault.comp_id", lit(1234)))
如何在数组数组中添加新字段。例如,在 ? 下添加新test_field
的columns
?我试图通过定义第一个索引来进入数组0
df.withColumn("event",col("event").withField("fault.collections.snapshots.0.ringbuffer.columns.0.test_field", lit("test_value")))
但是得到了这个错误
org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '.0' expecting {<EOF>, '.', '-'}(line 1, pos 25)
== SQL ==
fault.snapshots.snapshots.0.ringbuffer.columns.0.test_field
-------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:255)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:124)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:61)
at org.apache.spark.sql.catalyst.expressions.UpdateFields$.nameParts(complexTypeCreator.scala:693)
at org.apache.spark.sql.catalyst.expressions.UpdateFields$.apply(complexTypeCreator.scala:701)
at org.apache.spark.sql.Column.withField(Column.scala:927)
at org.apache.spark.sql.Dataset.transform(Dataset.scala:2751)
at org.apache.spark.sql.Dataset.transform(Dataset.scala:2751)
因此,所需的架构将如下所示。
root
|-- value: binary (nullable = false)
|-- event: struct (nullable = false)
| |-- eventtime: struct (nullable = true)
| | |-- seconds: long (nullable = true)
| | |-- ms: float (nullable = true)
| |-- fault: struct (nullable = true)
| | |-- collections: struct (nullable = true)
| | | |-- snapshots: array (nullable = false) --> ** FIRST LEVEL ARRAY (or array of arrays) **
| | | | |-- element: struct (containsNull = false)
| | | | | |-- ringbuffer: struct (nullable = true)
| | | | | | |-- columns: array (nullable = false) --> ** SECOND LEVEL ARRAY **
| | | | | | | |-- element: struct (containsNull = false)
| | | | | | | | |-- doubles: struct (nullable = true)
| | | | | | | | | |-- values: array (nullable = false)
| | | | | | | | | | |-- element: float (containsNull = false)
| | | | | | | | |-- test_field: string (nullable = true)
解决方案
推荐阅读
- reactjs - 在 Angular 8+ 中使用 React 组件的最佳方法?
- flutter - Flutter:ModalBottomSheet 与 ListView.separated() 大小到内容
- c# - 使用 LINQ Query 加入两个列表并仅显示相关数据
- grails - 更新用户配置文件时,无需用户输入即可将权限分配给用户
- spring-boot - 我是响应式编程的新手,尝试使用 Mono 和 concat 进行 UPSERT 操作,并在存储库中存在值
- html - CSS-Grid 中的网格
- javascript - React 刷新重定向到主页,与刷新页面的 url 相同
- typescript - Angular Test Simple Component -> NullInjectorError: No provider for xxx
- javascript - 在 Kotlin 的 WebView 的 JavascriptInterface 中执行页面的 JavaScript
- android - 当我点击 TextField 然后键盘打开但 TextField 没有在顶部颤动 Android 上滑动