scala - 使用 UDF 在 DF 中添加新列时出错
问题描述
我正在尝试获取“单词”列中所有值的第一个字母,但出现错误
> import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
import spark.implicits._
// Define case classes for input data
case class Docword(docId: Int, vocabId: Int, count: Int)
case class VocabWord(vocabId: Int, word: String)
// Read the input data
val docwords = spark.read.
schema(Encoders.product[Docword].schema).
option("delimiter", " ").
csv("hdfs:///user/ashhall1616/bdc_data/t3/docword.txt").
as[Docword]
val vocab = spark.read.
schema(Encoders.product[VocabWord].schema).
option("delimiter", " ").
csv("hdfs:///user/ashhall1616/bdc_data/t3/vocab.txt").
as[VocabWord]
def firstletter(x: String): String = {
x.substring(0,1)}
val firstletterUdf =spark.udf.regster[String,String]("firstletter", firstletter(_))
val joinfile = docwords.join(vocab, "vocabId").select($"word", $"docId", $"count").withColumn("firstletter", firstletterUdf($"word"))
joinfile.write.mode("overwrite").partitionBy("firstletter").parquet("file:///home/user204943816622/t3_docword_index_part.parquet")
joinfile.show(10)
错误:
val firstletterUdf =spark.udf.regster[String,String]("firstletter", firstletter(_))
<console>:100: error: value regster is not a member of org.apache.spark.sql.UDFRegistration
val firstletterUdf =spark.udf.regster[String,String]("firstletter", firstletter(_))
^
scala> val joinfile = docwords.join(vocab, "vocabId").select($"word", $"docId", $"count").withColumn("firstletter", firstletterUdf($"word"))
<console>:106: error: not found: value firstletterUdf
val joinfile = docwords.join(vocab, "vocabId").select($"word", $"docId", $"count").withColumn("firstletter", firstletterUdf($"word"))
想要得到输出:
|word|docId|count|firstLetter
飞机| 1| 1000| p
解决方案
推荐阅读
- android - Recycler 视图中的表格布局
- swagger - 从 Swagger UI 隐藏复杂的只读属性
- flutter - Flutter Firestore Operator 未定义
- php - 如何排列现有数据并根据其优先级排序
- flutter - 如何合并数组并访问两个数组 Dart 的字段?
- hybris - Smartedit - 用户组无法上传/替换/删除组件的媒体属性而不是管理组的一部分
- c# - 为什么 JsonConvert.DeserializeObject 返回错误:异常:值不能为空。(参数“来源”)
- javascript - Next Js-getStaticProps 在一个特定的服务器上不起作用。你知道原因吗?
- ag-grid-angular - 在 ag-grid 图表中绘制垂直/水平线
- next.js - Next.js 默认是同站点来源,但我仍然可以访问它