r - 尝试使用定义的拆分模式使用 str_split,适用于样本(df)但不适用于完整文档(df 和 d.table)
问题描述
所以我试图将地址拆分为 addr1 (街道地址)和 addr2 (单位) - 或多或少是准确的。我创建了一个“拆分模式”值来识别单元编号,并尝试将其应用于地址字符串。在这样做时,我收到以下错误:
split_patterns <- c(" APT "," STE "," UNIT "," # ")
split_patterns <- paste(split_patterns,collapse="|")
addsimple[c("addr1","addr2")] <- apply(str_split(addsimple$address,split_patterns,simplify=TRUE,n=2),2,str_trim)
Error in `[.data.table`(x, i, which = TRUE) :
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
我尝试创建一个带有子集的示例数据文件以在此处获得建议,并且示例文件有效:
address <- c("100 W 26TH ST", "100 W 26TH ST APT 7H", "11 PENN PLZ FL 6", "1170 BROADWAY", "1186 BROADWAY",
"1186 BROADWAY # 1003", "1200 BROADWAY", "1200 BROADWAY APT 3G", "125 W 31ST ST", "125 W 31ST ST APT 39F",
"126 W 34TH ST", "1261 BROADWAY" , "130 W 29TH ST RM 500", "134 W 32ND ST", "151 W 26TH ST FL 3",
"154 W 27TH ST", "154 W 27TH ST RM 4W", "155 W 29TH ST", "165 W 26TH ST", "20 W 27TH ST")
df_address <- as.data.frame(address)
split_patterns <- c(" APT "," STE "," UNIT "," # ")
split_patterns <- paste(split_patterns,collapse="|")
df_address[c("addr1","addr2")] <- apply(str_split(df_address$address,split_patterns,simplify=TRUE,n=2),2,str_trim)
我不允许放置整个原始数据框。但我认为让我对原始数据框而不是示例数据框造成麻烦的一点是原始的类是“data.table”“data.frame”。
> class(addsimple)
[1] "data.table" "data.frame"
>
> class(df_address)
[1] "data.frame"
将其重铸为 data.frame 是否有任何成本?
解决方案
推荐阅读
- javascript - 在日期选择器中选择年份和月份时如何获取日期
- spring - 为特定的 Get 端点创建 Prometheus 规则/警报,以验证它是否返回 200(Ok)
- javascript - 如何使用按钮 Vuejs 访问对象 ID
- c++ - Gem5 如何处理 X86 构建的应用程序指针?
- css - 我的 SASS 变量到 :root 没有被插值
- php - 无法在 MacOS 上的 PHP 和 Apache2 上安装 PDFLib
- asp.net - C# asp.net 动态填充
- java - 使用正则表达式或哈希图查找确切的单词
- java - 如何使用 @ApiParam 指定日期格式
- html - d3 :使用数据集中的列在散点图上着色点