apache-spark - Spark:创建嵌套模式
问题描述
带着火花,
import spark.implicits._
val data = Seq(
(1, ("value11", "value12")),
(2, ("value21", "value22")),
(3, ("value31", "value32"))
)
val df = data.toDF("id", "v1")
df.printSchema()
结果如下:
root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
| |-- _1: string (nullable = true)
| |-- _2: string (nullable = true)
现在如果我想自己创建模式,我应该如何处理?
val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", ???)
))
谢谢。
解决方案
根据此处的示例: https ://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val innerStruct =
StructType(
StructField("f1", IntegerType, true) ::
StructField("f2", LongType, false) ::
StructField("f3", BooleanType, false) :: Nil)
val struct = StructType(
StructField("a", innerStruct, true) :: Nil)
// Create a Row with the schema defined by struct
val row = Row(Row(1, 2, true))
在您的情况下,它将是:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", StructType(Array(
StructField("value1", StringType),
StructField("value2", StringType)
)))
))
输出:
StructType(
StructField(id,IntegerType,true),
StructField(nested,StructType(
StructField(value1,StringType,true),
StructField(value2,StringType,true)
),true)
)
推荐阅读
- javascript - 表达式中的Javascript正则表达式匹配表达式?(和\(。*?\))
- javascript - 如何在 Gmail 中收听键盘事件?
- python - 如果python中变量的值发生变化,如何打印语句?
- vba - 如何摆脱 VBA 动态范围内的循环引用?
- ios - Xamarin Forms 更新标签下划线不起作用
- node.js - 如何从feathersJS中的钩子之前返回未经授权的响应
- google-cloud-composer - 将外部工作人员连接到 Cloud Composer 气流
- python - 减小图像的大小
- swift - 重新保存键的 UserDefaults 值会在 swift 4 中出现错误
- javascript - 使用 .map FCC 格式化对象查询