json - 使用scala从JSON值中删除额外的“”
问题描述
我一直在尝试使用 scala 清理我的 JSON 对象,但我无法从我的 JSON 值示例“LAST_NM”中删除额外的“”:“SMITH“LIBBY”MARY”
我的字符串中多余的逗号会产生问题。
这是我用来清理 json 文件的代码
val readjson = sparkSession.sparkContext.textFile("dev.json")
val json=readjson.map(element=>element.replace("\"\":\"\"","\":\"")
.replace("\"\",\"\"","\",\"")
.replace("\"\":","\":")
.replace(",\"\"",",\"")
.replace("\"{\"\"","{\"")
.replace("\"\"}\"","\"}")
.replaceAll("\\u0009"," "))
.saveAsTextFile("JSON")
这是我要清理的 json 字符串(为便于阅读而添加了空格):
{
"SEQ_NO":597216,
"PROV_DEMOG_SK":597216,
"PROV_ID":"QMP000003371283",
"FRST_NM":"",
"LAST_NM":"SMITH "LIBBY" MARY",
"FUL_NM":"",
"GENDR_CD":"",
"PROV_NPI":"",
"PROV_STAT":"Incomplete",
"PROV_TY":"03",
"DT_OF_BRTH":"",
"PROFPROFL_DESGTN":"",
"ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000",
"PROV_CLSFTN_CD":"A",
"SRC_DATA_KEY":50,
"OPRN_CD":"I",
"REC_SET":"F"
}
我应该在我的代码中添加什么以从我的 json 字符串的 LAST_NM 值中删除额外的“”。
解决方案
检查下面的代码
df.map(_.replaceAll(" \""," ").replaceAll("\" "," ")).show(false)
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"SEQ_NO":597216,"PROV_DEMOG_SK":597216,"PROV_ID":"QMP000003371283","FRST_NM":"","LAST_NM":"SMITH LIBBY MARY","FUL_NM":"","GENDR_CD":"","PROV_NPI":"","PROV_STAT":"Incomplete","PROV_TY":"03","DT_OF_BRTH":"","PROFPROFL_DESGTN":"","ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000","PROV_CLSFTN_CD":"A","SRC_DATA_KEY":50,"OPRN_CD":"I","REC_SET":"F"}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
推荐阅读
- jquery - 通过 for 属性根据标签选择字段
- javascript - 在JS中的另一个数组中过滤对象数组
- javascript - maps 函数在控制台上提供 6 个数组并且不会输出单独的数组
- facebook - 使用 javascript 在 facebook 中为测试用户添加和获取运动和兴趣
- java - Jenkins 升级到 2.150 版本后无法加载
- angular - Angular 6,@ViewChild,*ngIf,在输入元素隐藏之前获取输入元素值
- javascript - 我如何检查 DOM 中加载的 npm 包组件的反应
- php - 接收 2 个可能参数的函数
- mongodb - 当 mongodb 集合为空时,响应式可尾游标关闭
- java - 如何在 rxJava2 中通信自定义任务和另一个类?