scala - 需要基于Scala中的一列展平数据框
问题描述
我有一个具有以下架构的数据框
root
|-- name: string (nullable = true)
|-- roll: string (nullable = true)
|-- subjectID: string (nullable = true)
数据框中的值如下
+-------------------+---------+--------------------+
| name| roll| SubjectID|
+-------------------+---------+--------------------+
| sam|ta1i3dfk4| xy|av|mm|
| royc|rfhqdbnb3| a|
| alcaly|ta1i3dfk4| xx|zz|
+-------------------+---------+--------------------+
我需要通过 flattenig 主题 ID 导出数据框,如下所示。请注意:SubjectID 也是字符串
+-------------------+---------+--------------------+
| name| roll| SubjectID|
+-------------------+---------+--------------------+
| sam|ta1i3dfk4| xy|
| sam|ta1i3dfk4| av|
| sam|ta1i3dfk4| mm|
| royc|rfhqdbnb3| a|
| alcaly|ta1i3dfk4| xx|
| alcaly|ta1i3dfk4| zz|
+-------------------+---------+--------------------+
任何建议
解决方案
您可以使用explode
函数来展平。例子:
val inputDF = Seq(
("sam", "ta1i3dfk4", "xy|av|mm"),
("royc", "rfhqdbnb3", "a"),
("alcaly", "rfhqdbnb3", "xx|zz")
).toDF("name", "roll", "subjectIDs")
//split and explode `subjectIDs`
val result = input.withColumn("subjectIDs",
split(col("subjectIDs"), "\\|"))
.withColumn("subjectIDs", explode($"subjectIDs"))
resultDF.show()
+------+---------+----------+
| name| roll|subjectIDs|
+------+---------+----------+
| sam|ta1i3dfk4| xy|
| sam|ta1i3dfk4| av|
| sam|ta1i3dfk4| mm|
| royc|rfhqdbnb3| a|
|alcaly|rfhqdbnb3| xx|
|alcaly|rfhqdbnb3| zz|
+------+---------+----------+
推荐阅读
- reactjs - React - Uncaught SyntaxError: Identifier 'WBSAutoFillFormTypeUndetermined' has already been declared
- azure-devops - how to trigger pipeline from only feature/topic branch in azure devops?
- speech-recognition - Error occurred when extracting features using MFCC: Value Error: can't extend empty axis 0 using modes other than 'constant' or 'empty'
- linux - Why I can delete others' files on HDFS with only read permission
- docker-compose - docker swam - secrets from file not resolving tilde
- ruby - Ruby: Can I convert a number input to a fixed sized byte array in ruby?
- angular - 如何使用 Jasmine 和 Typescript 修复 addEventListener 未定义
- encryption - I just decrypted a file with openssl, what file format has these encodings inside of it?
- javascript - How to use DynamoDB batchGet command
- angular - Trying to filter through a list: "TypeError: Cannot read property 'Name' of null"