scala - Dynamic dataframe with n columns and m rows
问题描述
Reading data from json
(dynamic schema) and i'm loading that to dataframe.
Example Dataframe:
scala> import spark.implicits._
import spark.implicits._
scala> val DF = Seq(
(1, "ABC"),
(2, "DEF"),
(3, "GHIJ")
).toDF("id", "word")
someDF: org.apache.spark.sql.DataFrame = [number: int, word: string]
scala> DF.show
+------+-----+
|id | word|
+------+-----+
| 1| ABC|
| 2| DEF|
| 3| GHIJ|
+------+-----+
Requirement: Column count and names can be anything. I want to read rows in loop to fetch each column one by one. Need to process that value in subsequent flows. Need both column name and value. I'm using scala.
Python:
for i, j in df.iterrows():
print(i, j)
Need the same functionality in scala and it column name and value should be fetched separtely.
Kindly help.
解决方案
df.iterrows
is not from pyspark, but from pandas. In Spark, you can use foreach
:
DF
.foreach{_ match {case Row(id:Int,word:String) => println(id,word)}}
Result :
(2,DEF)
(3,GHIJ)
(1,ABC)
I you don't know the number of columns, you cannot use unapply
on Row
, then just do :
DF
.foreach(row => println(row))
Result :
[1,ABC]
[2,DEF]
[3,GHIJ]
And operate with row
using its methods getAs
etc
推荐阅读
- java - 指定 AWS Cloudwatch 日志客户端的凭证
- swift - 每次编译他们正在测试的目标时运行单元测试
- android - 如何在 Android 设备上运行 VS Code
- node.js - Nodejs/MongoDB:将动态创建的对象推送到数组中
- typescript - 返回类型中的打字稿条件泛型类型
- javascript - 调用和读取外部 JS 文件 - Angular
- python - tf.keras.losses 中“BinaryCrossentropy”和“binary_crossentropy”的区别?
- mysql - 如何在 SQL 中按记录类型获取最新记录
- angularjs - angularjs:http.post() 成功执行,但没有向服务器文件发布任何内容
- tinymce - Tinymce - 如何让用户知道快捷方式