首页 > 解决方案 > 在 spark scala 中将列表列表转换为数据框

问题描述

现在我有一个这样的列表列表:

List(
  List(2,(String,String,String......),1,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),1,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),2,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),2,(String,String,String......),2,(String,String,String......)),
  List(3,(String,String,String......),1,(String,String,String......),2,(String,String,String......))
)

我期望的输出格式如下:

+-----+------------------+-----+------------------+-----+------------------+
|   _1|                _2|   _3|                _4|   _5|                _6|
+-----+------------------+-----+------------------+-----+------------------+
|2    |(String,String...)|1    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|1    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|2    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|2    |(String,String...)|2    |(String,String...)|
|3    |(String,String...)|1    |(String,String...)|2    |(String,String...)|
+-----+------------------+-----+------------------+-----+------------------+

如何在 spark scala 中进行转换?我真诚地希望有人可以帮助我。

标签: scalaapache-spark

解决方案


出于测试目的,我创建了与问题中提到的相同的测试数据

val nestedList = List(
  List(2,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"),2,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"))
)

现在您可以将内部列表转换为元组(您可以根据需要更改元组创建和类型转换中的元素数量)并调用toDF,您应该得到所需的输出为

nestedList.map(x => (x(0).asInstanceOf[Int], x(1).toString, x(2).asInstanceOf[Int], x(3).toString, x(4).asInstanceOf[Int], x(5).toString)).toDF().show()

这应该给你

+---+--------------------+---+--------------------+---+--------------------+
| _1|                  _2| _3|                  _4| _5|                  _6|
+---+--------------------+---+--------------------+---+--------------------+
|  2|(String,String,St...|  1|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  1|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  2|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  2|(String,String,St...|  2|(String,String,St...|
|  3|(String,String,St...|  1|(String,String,St...|  2|(String,String,St...|
+---+--------------------+---+--------------------+---+--------------------+

我希望答案有帮助


推荐阅读