首页 > 解决方案 > Stringbuilder to RDD

问题描述

I have a string builder(sb) with data as below in Scala IDE

CellId,Date,Time,MeasType,MeasResult

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.emergency,0

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.highPriorityAccess,0

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.mt-Access,4

Now I want to convert this string into RDD by using scala. Please help me.

I am using this code. But no luck. Thanks in advance

 val headerFile = sc.parallelize(sb)
 headerFile.collect()

标签: scala

解决方案


StringBuilder用于从可变的字符序列构建字符串。因此,您添加到构建器的任何内容都将被附加为一个字符串。

您需要分离添加的字符串以用作 sparkcontext 中的字符串列表

假设字符串添加了尾随换行符,您可以使用换行符拆分字符串构建器并将其转换为rdd

val headerFile = sc.parallelize(sb.toString.split("\n"))
headerFile.collect()

要可视化数据,您必须打印它们或将它们保存到文件中

现在,如果您想在保存之前转换为数据框,那么您可以执行如下操作

val data = sb.toString.split("\n")
import org.apache.spark.sql.types._
val schema = StructType(data.head.split(",").map(StructField(_, StringType, true)))
val rdd = sc.parallelize(sb.toString.split("\n").tail.map(line => Row.fromSeq(line.split(","))))
spark.createDataFrame(rdd, schema).show(false)

这应该给你

+---------+----------+--------+-----------------------------------+----------+
|CellId   |Date      |Time    |MeasType                           |MeasResult|
+---------+----------+--------+-----------------------------------+----------+
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.emergency         |0         |
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.highPriorityAccess|0         |
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.mt-Access         |4         |
+---------+----------+--------+-----------------------------------+----------+

推荐阅读