r - readr::read_csv() - 使用嵌套引号解析失败
问题描述
我有一个 csv,其中一些列有一个带引号的列,里面有另一个引号:
"blah blah "nested quote""
它会产生解析失败。我不确定这是一个错误还是有解决这个问题的论据?
Reprex(文件在这里或下面粘贴的内容):
readr::read_csv("~/temp/shittyquotes.csv")
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> INSTNM = col_character(),
#> ADDR = col_character(),
#> CITY = col_character(),
#> STABBR = col_character(),
#> ZIP = col_character(),
#> CHFNM = col_character(),
#> CHFTITLE = col_character(),
#> EIN = col_character(),
#> OPEID = col_character(),
#> WEBADDR = col_character(),
#> ADMINURL = col_character(),
#> FAIDURL = col_character(),
#> APPLURL = col_character(),
#> ACT = col_character(),
#> IALIAS = col_character(),
#> INSTCAT = col_character(),
#> CCBASIC = col_character(),
#> CCIPUG = col_character(),
#> CCSIZSET = col_character(),
#> CARNEGIE = col_character()
#> # ... with 2 more columns
#> )
#> See spec(...) for full column specifications.
#> Warning: 3 parsing failures.
#> row col expected actual file
#> 2 IALIAS delimiter or quote C '~/temp/shittyquotes.csv'
#> 2 IALIAS delimiter or quote D '~/temp/shittyquotes.csv'
#> 2 NA 59 columns 100 columns '~/temp/shittyquotes.csv'
#> # A tibble: 2 x 59
#> UNITID INSTNM ADDR CITY STABBR ZIP FIPS OBEREG CHFNM CHFTITLE
#> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 441238 City … 1500… Duar… CA 9101… 6 8 Dr. … Director
#> 2 441247 Commu… 3800… Mode… CA 9535… 6 8 Vict… Preside…
#> # ... with 49 more variables: GENTELE <dbl>, EIN <chr>, OPEID <chr>,
#> # OPEFLAG <dbl>, WEBADDR <chr>, ADMINURL <chr>, FAIDURL <chr>,
#> # APPLURL <chr>, SECTOR <dbl>, ICLEVEL <dbl>, CONTROL <dbl>,
#> # HLOFFER <dbl>, UGOFFER <dbl>, GROFFER <dbl>, FPOFFER <dbl>,
#> # HDEGOFFR <dbl>, DEGGRANT <dbl>, HBCU <dbl>, HOSPITAL <dbl>,
#> # MEDICAL <dbl>, TRIBAL <dbl>, LOCALE <dbl>, OPENPUBL <dbl>, ACT <chr>,
#> # NEWID <dbl>, DEATHYR <dbl>, CLOSEDAT <dbl>, CYACTIVE <dbl>,
#> # POSTSEC <dbl>, PSEFLAG <dbl>, PSET4FLG <dbl>, RPTMTH <dbl>,
#> # IALIAS <chr>, INSTCAT <chr>, CCBASIC <chr>, CCIPUG <chr>,
#> # CCIPGRAD <dbl>, CCUGPROF <dbl>, CCENRPRF <dbl>, CCSIZSET <chr>,
#> # CARNEGIE <chr>, TENURSYS <dbl>, LANDGRNT <dbl>, INSTSIZE <chr>,
#> # CBSA <dbl>, CBSATYPE <chr>, CSA <dbl>, NECTA <dbl>, DFRCGID <dbl>
由reprex 包(v0.2.1)于 2018 年 12 月 4 日创建
还有这里的 csv 内容:
UNITID,INSTNM,ADDR,CITY,STABBR,ZIP,FIPS,OBEREG,CHFNM,CHFTITLE,GENTELE,EIN,OPEID,OPEFLAG,WEBADDR,ADMINURL,FAIDURL,APPLURL,SECTOR,ICLEVEL,CONTROL,HLOFFER,UGOFFER,GROFFER,FPOFFER,HDEGOFFR,DEGGRANT,HBCU,HOSPITAL,MEDICAL,TRIBAL,LOCALE,OPENPUBL,ACT,NEWID,DEATHYR,CLOSEDAT,CYACTIVE,POSTSEC,PSEFLAG,PSET4FLG,RPTMTH,IALIAS,INSTCAT,CCBASIC,CCIPUG,CCIPGRAD,CCUGPROF,CCENRPRF,CCSIZSET,CARNEGIE,TENURSYS,LANDGRNT,INSTSIZE,CBSA,CBSATYPE,CSA,NECTA,DFRCGID
441238,"City of Hope Graduate School of Biological Science","1500 E Duarte Rd","Duarte","CA","91010-3000", 6, 8,"Dr. Arthur Riggs","Director","6263018293","953432210","03592400",1,"gradschool.coh.org"," "," "," ",2,1,2,9,2,1,2,10,1,2,-2,2,2,21,1,"A ",-2,-2,"-2",1,1,1,1,1," ",1,25,-2,-2,-2,7,-2,-3,1,2,1,31100,1,348,-2,198
441247,"Community Business College","3800 McHenry Ave Suite M","Modesto","CA","95356-1569", 6, 8,"Victor L. Vandenberghe","President","2095293648","484-8230","03615300",7,"www.communitybusinesscollege.edu","www.communitybusinesscollege.edu","www.cbc123.com","www.123.com",9,3,3,1,1,2,2,0,2,2,-2,2,2,12,1,"A ",-2,-2,"-2",1,1,1,1,2,"formerly "Community Business School"",6,-3,-3,-3,-3,-3,-3,-3,2,2,1,33700,1,-2,-2,71
441256,"Design's School of Cosmetology","715 24th St Ste E","Paso Robles","CA","93446", 6, 8,"Sharon Skinner","Administrator","8052378575","80002030","03646300",1,"designsschool.com"," "," "," ",9,3,3,2,1,2,2,0,2,2,-2,2,2,13,1,"A ",-2,-2,"-2",1,1,1,1,2," ",6,-3,-3,-3,-3,-3,-3,-3,2,2,1,42020,1,-2,-2,46
解决方案
Jim Hester 提供了这个答案:
您需要使用escape_double = FALSE
参数来read_delim()
. 这不是一部分,read_csv()
因为 excel 样式的 csvs 通过将内部引号加倍来逃避内部引号。
推荐阅读
- graphics - 检测矩形交叉圆
- arrays - PowerShell 中的变量数组具有空成员
- postgresql - 如何在 Docker 上部署 postgres_exporter 并连接到 prometheus
- java - 是否有与 jdbc 或其他数据库交互的类(类似于 contentProvider)?
- regex - python 正则表达式从论文的doi中删除多余的字符
- angular - 使用“forRoot”将配置数据传递给角度库的依赖项
- php - 如何在 Laravel 中将带有访问器的属性正确设置到模型中?
- javascript - material-ui输入选择属性隐藏body溢出-y滚动-react js
- apache - 当url包含特殊字符时如何重定向错误页面?
- python - 连接具有不同索引的数据帧