首页 > 解决方案 > 阅读器中的 read_tsv 未正确解析表

问题描述

我正在尝试读取制表符分隔的表格,该表格不断产生一些解析失败。我认为是由于在文本中使用了非反斜杠引号。请参阅下面的示例:

concept_id  concept_name    domain_id   vocabulary_id   concept_class_id    standard_concept    concept_code    valid_start_date    valid_end_date  invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care   Observation HCPCS   HCPCS Modifier  S   GN  19990101    20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS   HCPCS Modifier  S   GJ  19981001    20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS   HCPCS Modifier  S   GH  19981001    20991231

请注意第二列中的“选择退出”,问题似乎源于此。以下代码存在解析失败:

df <- read_delim(
  file = "~/_data/test.csv",
  col_types = cols(
    col_integer(), col_character(), col_character(),
    col_character(), col_character(), col_character(),
    col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
    col_character()),
  delim = "\t"
  )

Warning: 4 parsing failures.
row          col                     expected    actual               file
  1 NA           10 columns                   9 columns '~/_data/test.csv'
  2 concept_name delimiter or quote                     '~/_data/test.csv'
  2 concept_name closing quote at end of file           '~/_data/test.csv'
  2 NA           10 columns                   2 columns '~/_data/test.csv'

我似乎无法指定解决方案。

标签: rreadr

解决方案


这解决了这个问题。我需要将quote参数修改为quote = ""

df <- read_delim(
  file = "~/_data/test.csv",
  col_types = cols(
    col_integer(), col_character(), col_character(),
    col_character(), col_character(), col_character(),
    col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
    col_character()),
  quote = "",
  delim = "\t"
  )

推荐阅读