scala - 如何使用相同的案例类创建多个数据框
问题描述
如何使用相同的案例类创建多个数据框?假设我想创建多个数据框,一个有 5 列,另一个有 3 列,我将如何使用单个案例类来实现?
解决方案
You can't create two Dataframe using single case class with the same number of columns directly. Assume you have the below case class FlightData
. If you created a Dataframe from this case class it will contains 3 columns. However, you could create two Dataframe but in the next one you can select some column from this case class. If you have two different file and every file contains different structure you need to create two separated case class.
val someData = Seq(
Row("United States", "Romania", 15),
Row("United States", "Croatia", 1),
Row("United States", "Ireland", 344),
Row("Egypt", "United States", 15)
)
val flightDataSchema = List(
StructField("DEST_COUNTRY_NAME", StringType, true),
StructField("ORIGIN_COUNTRY_NAME", StringType, true),
StructField("count", IntegerType, true)
)
case class FlightData(DEST_COUNTRY_NAME: String, ORIGIN_COUNTRY_NAME: String, count: Int)
import spark.implicits._
val dataDS = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(flightDataSchema)
).as[FlightData]
val dataDS_2 = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(flightDataSchema)
).as[FlightData].select('DEST_COUNTRY_NAME)
推荐阅读
- ios - 从 Any 投射时 UIColor 子类崩溃?
- plugins - 如何在页面上添加两个(或多个)相同类型的 AJS.MultiSelect 字段
- asp.net-core - 是否可以从 MassTransit 的错误队列中读取和删除消息
- amazon-web-services - CloudFormation 模板相当于添加“其他 AWS 账户的访问权限”
- installation - 蓝色棱镜并排或安装多个版本(不同)的可能性/方法
- mysql - MySQL 对似乎在工作台上运行良好的查询给出错误
- html - 在服务器上部署网站后,相对链接不起作用
- maven - Docker 构建失败“复制失败:未指定源文件”
- git - Git rebase 功能分支搞乱了对开发/主分支的拉取请求中的提交
- google-apps-script - 复制行的范围而不是整行 - 应用脚本