首页 > 解决方案 > 使用scala从JSON值中删除额外的“”

问题描述

我一直在尝试使用 scala 清理我的 JSON 对象,但我无法从我的 JSON 值示例“LAST_NM”中删除额外的“”:“SMITH“LIBBY”MARY”

我的字符串中多余的逗号会产生问题。

这是我用来清理 json 文件的代码

val readjson = sparkSession.sparkContext.textFile("dev.json")
    val json=readjson.map(element=>element.replace("\"\":\"\"","\":\"")
   .replace("\"\",\"\"","\",\"")
   .replace("\"\":","\":")
   .replace(",\"\"",",\"")
   .replace("\"{\"\"","{\"")
   .replace("\"\"}\"","\"}")
   .replaceAll("\\u0009"," "))
   .saveAsTextFile("JSON")

这是我要清理的 json 字符串(为便于阅读而添加了空格):

{
  "SEQ_NO":597216,
  "PROV_DEMOG_SK":597216,
  "PROV_ID":"QMP000003371283",
  "FRST_NM":"",
  "LAST_NM":"SMITH "LIBBY" MARY",
  "FUL_NM":"",
  "GENDR_CD":"",
  "PROV_NPI":"",
  "PROV_STAT":"Incomplete",
  "PROV_TY":"03",
  "DT_OF_BRTH":"",
  "PROFPROFL_DESGTN":"",
  "ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000",
  "PROV_CLSFTN_CD":"A",
  "SRC_DATA_KEY":50,
  "OPRN_CD":"I",
  "REC_SET":"F"
}

我应该在我的代码中添加什么以从我的 json 字符串的 LAST_NM 值中删除额外的“”。

标签: jsonscalaapache-sparkapache-spark-sql

解决方案


检查下面的代码

df.map(_.replaceAll(" \""," ").replaceAll("\" "," ")).show(false)

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"SEQ_NO":597216,"PROV_DEMOG_SK":597216,"PROV_ID":"QMP000003371283","FRST_NM":"","LAST_NM":"SMITH LIBBY MARY","FUL_NM":"","GENDR_CD":"","PROV_NPI":"","PROV_STAT":"Incomplete","PROV_TY":"03","DT_OF_BRTH":"","PROFPROFL_DESGTN":"","ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000","PROV_CLSFTN_CD":"A","SRC_DATA_KEY":50,"OPRN_CD":"I","REC_SET":"F"}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


推荐阅读