apache-spark - 如何将 spark-dataframe 包装到根元素?
问题描述
我有一个简单的 json 数组,我可以在 spark-dataframe 中读取它。您能否帮助将这些列包装到自定义根标签中。更准确地说,与explode 选项完全相反,限制了自定义目标根列的整个数据框行。
Initial Json Data:
[{"tpeKeyId":"301461865","acctImplMgrId":null,"acctMgrId":null,"agreCancDt":null,"agreEffDt":null,"pltfrmNm":"EMPLOYEE NAVIGATOR","premPyRmtInd":null,"recCrtTs":"2016-11-08 13:01:44.290418","recCrtUsrId":"testedname","recUpdtTs":"2018-10-16 12:16:21.579446","recUpdtUsrId":"testname","spclInstrFormCd":null,"sysCd":null,"tpeNm":"EMPLOYEE NAVIGATOR","univPrdcrId":"9393939393"},{"tpeKeyId":"901972280","acctImplMgrId":null,"acctMgrId":null,"agreCancDt":null,"agreEffDt":null,"pltfrmNm":"datalion","premPyRmtInd":null,"recCrtTs":"2018-12-10 01:36:14.925833","recCrtUsrId":"exactlydata","recUpdtTs":"2018-12-10 01:36:14.925833","recUpdtUsrId":"datalion ","spclInstrFormCd":null,"sysCd":null,"tpeNm":"lialion","univPrdcrId":"89899898989"}]
First Dataframe:
+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+
|acctImplMgrId|acctMgrId|agreCancDt|agreEffDt|pltfrmNm |premPyRmtInd|recCrtTs |recCrtUsrId|recUpdtTs |recUpdtUsrId |spclInstrFormCd|sysCd|tpeKeyId |tpeNm |univPrdcrId|
+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+
|null |null |null |null |EMPLOYEE NAVIGATOR|null |2016-11-08 13:01:44.290418|testedname |2018-10-16 12:16:21.579446|testname |null |null |301461865|EMPLOYEE NAVIGATOR|9393939393 |
|null |null |null |null |datalion |null |2018-12-10 01:36:14.925833|exactlydata|2018-12-10 01:36:14.925833|datalion |null |null |901972280|lialion |89899898989|
+-------------+---------+----------+---------+------------------+------------+--------------------------+-----------+--------------------------+----------------+---------------+-----+---------+------------------+-----------+
手动连接根标签后:
val addingRootTag= "{ \"roottag\" :" + fileContents + "}"
val rootTagDf = spark.read.json(Seq(addingRootTag).toDS())
rootTagDf.show(false)
Second Dataframe:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|roottag |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[,,,, EMPLOYEE NAVIGATOR,, 2016-11-08 13:01:44.290418, testedname, 2018-10-16 12:16:21.579446, testname,,, 301461865, EMPLOYEE NAVIGATOR, 9393939393], [,,,, datalion,, 2018-12-10 01:36:14.925833, exactlydata, 2018-12-10 01:36:14.925833, datalion ,,, 901972280, lialion, 89899898989]]|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
问题是,我们在 spark-framework 支持的 api 中是否有任何这样的方法来避免手动连接roottag
并将第一个数据帧包装以构建显示为第二个数据帧?EXACTLY OPPOSITE TO EXPLODE OPTION
解决方案
推荐阅读
- c# - System.BadImageFormatException - 版本之间不匹配
- php - Laravel 7 Godaddy 托管中的 SMTP
- reactjs - React:列表中的兄弟姐妹必须具有唯一键。这些列表兄弟姐妹的孩子是否也需要它们?
- cordova - 为 Windows Ionic App 创建 exe 时,所有 Dll 均未签名
- java - 如何从 JEditorPane 保存数据?
- python - matplotlib imshow in loop exits with code -1073741819 (0xC0000005)
- .net-core - 修改启动设置后出错:无法应用启动配置文件“(默认)”
- windows - 仅替换字符串中管道字符的第一次出现
- excel-online - 通过用户与 Excel Online 中的 Office 脚本交互调用函数
- javascript - 有没有办法将 undefined.undefined 作为函数参数传递