首页 > 解决方案 > Scala ,带有值列表的 Dataframe 列,我想将每个值创建为新列并想命名它

问题描述

我有一个下面给出的数据框列。

House_No = INT
family_details = ["name" , age , "surname" , weight]
Ownership = Boolean

我想用姓名、年龄、姓氏和体重为数据框创建新列。

House_No
family_details
Ownership
name
age
surname
weight

标签: scalafunctional-programmingapache-spark-sqluser-defined-functions

解决方案


以下解决方案将为您提供帮助:

     val data =  Array((2,Array("abc","23","xyz","70"),true),(3,Array("lmn","45","pqr","50"),false))

     val rdd = sc.parallelize(data)

     val df = rdd.toDF("house_no","family_details","ownership")

val res = df.select("house_no","ownership","family_details").withColumn("name", split($"family_details" (0), ",")(0)).withColumn("age", split($"family_details"(1), ",")(0)).withColumn("surmname", split($"family_details"(2), ",")(0)).withColumn("Weight", split($"family_details"(3), ",")(0)).drop("family_details")

推荐阅读