首页 > 解决方案 > 如何在 R 中解析多个分隔数据中的列/值

问题描述

我收到了一个 0365 中的奇怪文件,它似乎由 : 和 , 用引号分隔。我想将它们放入单独的列和值中。

下面的一个例子:

CreationDate UserID  AuditData 
2020-05-04   User1   {"Id":"4ccd2","RecordType":20,"CreationTime":"2020-05-04T10:24:44"} 
2020-04-14   User2   {"Id":"4def5","RecordType":18,"CreationTime":"2020-04-14T10:24:44"} 
2020-03-29   User3   {"Id":"4zxc2","RecordType":07,"CreationTime":"2020-03-29T10:24:44"}

目标:将 AuditData 列分解为:1) Id 和 value 2) RecordType 和 value 3) CreationTime 和 value

等等等等

我一直在尝试用单独的()做几件事,但到目前为止都没有成功。谢谢!

标签: rdplyrcsv

解决方案


这是一个tidyverse使用separate.

#Your data
df<-read.csv(text = 'CreationDate UserID AuditData
2020-05-04 User1 {"Id":"4ccd2","RecordType":20,"CreationTime":"2020-05-04T10:24:44"}
2020-04-14 User2 {"Id":"4def5","RecordType":18,"CreationTime":"2020-04-14T10:24:44"}
2020-03-29 User3 {"Id":"4zxc2","RecordType":07,"CreationTime":"2020-03-29T10:24:44"}',
         sep = " ")

library(tidyverse)
df %>%
   # remove keys using gsub
   mutate_at(vars(AuditData), function(x) gsub("\\{|\\}","",x)) %>%
   # separate using the colon or comma (however this separates also the time values)
   separate(col = AuditData, 
            # Define the new column names
            into = c("Id","Idvalue","RecordType","RecordTypevalue","CreationTime","temp","time1","time2"),
            # Use : or , as separators
            sep = "\\:|\\,") %>%
   # Use paste to reconstruct the time values
   mutate(CreationTimevalue = paste(temp,time1,time2, sep = ":")) %>%
   # Eliminate unused columns: temp, time1 and time2 
   select(-c(temp,time1,time2))

# CreationDate UserID Id Idvalue RecordType RecordTypevalue CreationTime   CreationTimevalue
# 1   2020-05-04  User1 Id   4ccd2 RecordType              20 CreationTime 2020-05-04T10:24:44
# 2   2020-04-14  User2 Id   4def5 RecordType              18 CreationTime 2020-04-14T10:24:44
# 3   2020-03-29  User3 Id   4zxc2 RecordType              07 CreationTime 2020-03-29T10:24:44

推荐阅读