首页 > 解决方案 > R - 带有字符列到 json 的数据框

问题描述

我有数据框,其中 1 列被格式化为字符,但它实际上是 json。我注意到在 stackoverflow 上有很多与 jsons 相关的问题,但它没有找到这样的场景。

df <- read.table(text="
              id date       paid_at    binded_at  
1            107 2016-12-16 2017-06-02 2017-06-07
2            107 2017-11-27 2017-06-02 2017-06-07
3            107 2017-11-28 2017-06-02 2017-06-07
4            109 2016-11-28 2017-01-01 2017-06-07
5            109 2017-11-29 2017-01-01 2017-06-07
6            110 2017-12-04 2018-01-01 2017-06-07", header=TRUE)

由于该专栏很长,我将其放在这里:

df$verification
#> {"data": {"verify_client_by_params_response": {"@xmlns": "Bank of America", "verify_check": "AJDSA34&"}}}
class(df$verification)
#> list

我想要做的是将此字符解析为 json,然后为每个单独的列创建,如下所示

df <- read.table(text="
              id date       paid_at    binded_at    @xmlns          verify_check
1            107 2016-12-16 2017-06-02 2017-06-07   Bank of America  AJDSA34&"
    , header=TRUE)

dput()包含示例的完整数据框:

structure(
  list(
    id = c(107L, 107L, 107L, 109L, 109L, 110L),
    date = c("2016-12-16", "2017-11-27", "2017-11-28", "2016-11-28", "2017-11-29", "2017-12-04"),
    paid_at = c("2017-06-02", "2017-06-02", "2017-06-02", "2017-01-01", "2017-01-01", "2018-01-01"),
    binded_at = c("2017-06-07", "2017-06-07", "2017-06-07", "2017-06-07", "2017-06-07", "2017-06-07"),
    verification = c(
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}",
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}",
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}",
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}",
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}",
      "{\"data\": {\"verify_client_by_params_response\": {\"@xmlns\": \"Bank of America\", \"verify_check\": \"AJDSA34&\"}}}"
    )
  ),
  row.names = c("1", "2", "3", "4", "5", "6"),
  class = "data.frame"
)

标签: rjson

解决方案


当所有 jsons 包含相同的值时,答案的简单版本如下:

df <- read.table(text="
              id date       paid_at    binded_at  
1            107 2016-12-16 2017-06-02 2017-06-07
2            107 2017-11-27 2017-06-02 2017-06-07
3            107 2017-11-28 2017-06-02 2017-06-07
4            109 2016-11-28 2017-01-01 2017-06-07
5            109 2017-11-29 2017-01-01 2017-06-07
6            110 2017-12-04 2018-01-01 2017-06-07", header=TRUE)

df$verification  <- '{"data": {"verify_client_by_params_response": {"@xmlns": "Bank of America", "verify_check": "AJDSA34&"}}}'

dt2 <- sapply(df$verification, function(x) unlist(jsonlite::fromJSON(x)$data[[1]]))
dt2 <- t(dt2)
rownames(dt2) <- NULL
dt2

cbind(df[,-5], dt2)

如果 json 不同(它们包含不同的字段),这可能会有所帮助https://stackoverflow.com/a/52647197/10441348(问题是关于解析 xmls,而不是 jsons,但想法几乎相同)。


推荐阅读