从 JSON 日志文件中提取字段值的 R 代码


我有一个文件,其中包含来自日志集合的 50,000 条记录。我需要为每条记录提取 "State": & "Code": 后面的值。我尝试过正则表达式,但无法正常工作。相反,我尝试使用此命令来查看是否可以得到其中的 1 个值,但它只是超时。

#this never completes
sub(".*?Code(.*?);.*", "\\1", logfile 

我没有这类工作的经验,所以我很感激任何帮助!这正是日志文件的格式(假设是 JSON)。我的目标是返回以下值(如果不能包含状态和代码,则可以):


以下是带有 2 条记录的日志文件的确切语法:

    2020-05-12 00:07:00.9681200, z123-asddfas,"
    ========== mode for SKU ==========
    ========== Records found ==========
    No records found
    ========== DRecords found ==========
    No drecords found
    2020-05-12 00:08:46.5076411,qwer98-asdha,"
    ========== mode for SKU ==========
    ========== records found ==========
        "State":  "Red",
        "Code":  null
    ========== DRecords found ==========
    No drecords found
    2020-05-12 00:10:02.6607640,qweaso-34324-asda,"
    ========== mode for SKU ==========
    ========== records found ==========
        "State":  "Blue",
        "Code":  "no receipt"

tibble(lines = logIn) %>% 
     # Keep only the lines with 'state' or 'code'
  filter(str_detect(lines, "(?ix) ( state | code )")) %>% 
     # Clean out all the whitespace and punct, except the ':'
  mutate(lines = str_replace_all(lines, '["\\s,]', '')) %>% 
     # Use separate to divide into two new columns
  separate(lines, c("ATTR", "VALUE"), sep = ":")


# A tibble: 4 x 2
  <chr> <chr>    
1 State Red      
2 Code  null     
3 State Blue     
4 Code  noreceipt
##################### 按要求
tibble(lines = logIn) %>% 
  # Keep only the lines with 'state' or 'code'
  filter(str_detect(lines, "(?ix) ( state | code )")) %>% 
    # This ID will come in useful
  rowid_to_column("ID") %>% 
  # Clean out all the whitespace and punct, except the ':'
  mutate(lines = str_replace_all(lines, '["\\s,]', ''),
         # Give each State and Code the same ID.
         ID = floor((ID + 1) / 2)) %>% 
  # Use separate to divide into two new columns
  separate(lines, c("ATTR", "VALUE"), sep = ":") %>% 
    # spread take it from longform to wideform
  spread(key = ATTR, value = VALUE) %>% 
  select(ID, State, Code)

# A tibble: 2 x 3
     ID State Code     
  <dbl> <chr> <chr>    
1     1 Red   null     
2     2 Blue  noreceipt
