首页 > 解决方案 > 将子字符串复制到下面的字符串,以两个字符串的内容为条件

问题描述

我的数据看起来像这样:

A         toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:                                      Amount
17:                   Taxable interest:  Number
18:                                      Amount
19:                 Ordinary dividends:  Number
20:                                      Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

DF <- structure(list(toberevised = c("[Money amounts are in thousands of dollars]", 
NA, NA, NA, "Item", NA, NA, NA, NA, "Number of returns", "Number of joint returns", 
"Number with paid preparer's signature", "Number of exemptions", 
"Adjusted gross income (AGI) [3]", "Salaries and wages in AGI: [4] Number", 
"Amount", "Taxable interest:  Number", "Amount", "Ordinary dividends:  Number", 
"Amount")), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))

我想编写一段代码,在其他行之前复制第:15、17 和 19 行之前的部分Amount,所以:

 A        toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:           Salaries and wages in AGI: Amount
17:                   Taxable interest:  Number
18:                    Taxable interest: Amount
19:                 Ordinary dividends:  Number
20:                Ordinary dividends:   Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

我尝试了一些非常笨拙的解决方案,例如将必须的单元格复制:到新列,填充该列,然后尝试Number从该列中删除,之后我可以连接这些列,之后我必须删除所有的 debree。

DF <- setDT(DF)[grepl(":", DF$toberevised), type:=toberevised]
DF$type <- na.locf(DF$type, na.rm=FALSE)
DF$type <- gsub("[[:punct:]]*Number[[:punct:]]*", "", DF$type)
DF$fullname <- paste(DF$type,DF$toberevised)

除了它不起作用之外,它也有点麻烦。

有什么更好的方法来做到这一点?我正在考虑检查一个单元格是否有: Number并且下面的单元格是否在下面Amount的字符串之前粘贴了子:字符串。但我不知道如何写这样的东西..

标签: rstringif-statementconcatenationpaste

解决方案


你可以做 :

#Get the index of row where current row has "Amount" and previous had "Number"
library(data.table)
inds <- which(DF$toberevised == 'Amount' & shift(grepl('Number', DF$toberevised)))

#Paste those rows with revised value from previous row.
DF$toberevised[inds] <- paste0(sub(':.*', '', DF$toberevised[inds - 1]), 
                                   ': Amount')

推荐阅读