首页 > 解决方案 > 如何从 Spark,scala 中的 DF 字符串列中仅取出字符串的一部分

问题描述

在 Dataframe 内部,我有一列包含以下数据

('Rated 3.0', "RATED\n \nWent there for a quick bite with friends.\nThe ambience had more of corporate feel. I would say it was unique.\nTried nachos, pasta churros and lasagne.\n\nNachos were pathetic.( Seriously don't order)\nPasta was okayish.\nLasagne was good.\nNutella churros were the best.\nOverall an okayish experience!\nPeace ??"), ('Rated 4.0', "RATED\n  First of all, a big thanks to the staff of this Cafe. Very polite and courteous.\n\nI was there 15mins before their closing time. Without any discomfort or hesitation, the staff welcomed me with a warm smile and said they're still open, though they were preparing to close the cafe for the day.\n\nQuickly ordered the Thai green curry, which is served with rice. They got it for me within 10mins, hot and freshly made.\n\nIt was tasty with the taste of coconut milk. Not very spicy, it was mild spicy.\n\nI saw they had yummy looking dessert menu, should go there to try them out!\n\nA good spacious place to hang out for coffee, pastas, pizza or Thai food.")

我需要Rated 3.0从每条记录中取出部分。这是一个 StringType 列。如何删除多余的数据并提取它?

标签: scaladataframeapache-sparkapache-spark-sqldataset

解决方案


这是我的解决方案:假设该问题有两条记录。

//创建列表//

val mytestList=List(("""Rated 3.0, RATED Went there for a quick bite with friends.The ambience had more of corporate feel. I would say it was unique.Tried nachos, pasta churros and lasagne.Nachos were pathetic.( Seriously don't order)Pasta was okayish.Lasagne was good.Nutella churros were the best.Overall an okayish experience!Peace ??"""), 
("""Rated 4.0, RATED  First of all, a big thanks to the staff of this Cafe. Very polite and courteous.I was there 15mins before their closing time. Without any discomfort or hesitation, the staff welcomed me with a warm smile and said they're still open, though they were preparing to close the cafe for the day.Quickly ordered the Thai green curry, which is served with rice. They got it for me within 10mins, hot and freshly made.It was tasty with the taste of coconut milk. Not very spicy, it was mild spicy.I saw they had yummy looking dessert menu, should go there to try them out!A good spacious place to hang out for coffee, pastas, pizza or Thai food."""))

//加载列表到RDD//

val rdd = spark.sparkContext.parallelize(mytestList)

//强加模式列名//

val DF1 = rdd.toDF("Rating")

//解决方案1

DF1.withColumn("tmp", split($"Rating", ",")).select($"tmp".getItem(0).as("col1")).show()
+---------+
|     col1|
+---------+
|Rated 3.0|
|Rated 4.0|
+---------+

//解决方案2删除/删除其他人

DF1.withColumn("tmp", split(col("Rating"), ",").getItem(0)).drop("Rating").show()


+---------+
|      tmp|
+---------+
|Rated 3.0|
|Rated 4.0|
+---------+

推荐阅读