首页 > 解决方案 > 如何删除此 csv 文件中包含某些字符串的行

问题描述

我正在阅读的文件非常大,并且某个字符串总是在整个文件中出现多次。我只需要让它遍历文件并删除包含这些特定字符串/NA的每一行。

我已经使用 grep 函数来尝试摆脱字符串,但它只摆脱了第一个出现的字符串,而没有其他相同的字符串。

RAO <- readr::read_csv(file = "RateAddOnsExcel.csv")

RAO$...4 <- NULL
RAO$...5 <- NULL
RAO$Quarter. <- NULL
names(RAO)[1:13] = c("ProviderName","AIMNumber", "ChainName", 
"RateEffectiveDate", "ComponentTotal", 
                 "VentAddOn", "QualityAddOn", 
"SpecialCareUnitAddOn", "AssessmentAddOn", 
                 "SelectedExpenditureAddOn", "RateReduction", 
"CaseMixRate", "CaseMixAssessment")
RAO$AIMNumber <- NULL
RAO$ChainName <- NULL
names(RAO)[1:13] = c("ProviderName","AIMNumber", "ChainName", 
"RateEffectiveDate", "ComponentTotal", 
                 "VentAddOn", "QualityAddOn", 
"SpecialCareUnitAddOn", "AssessmentAddOn", 
                 "SelectedExpenditureAddOn", "RateReduction", 
"CaseMixRate", "CaseMixAssessment")

RAO <- RAO[-which(apply(RAO, 1, function(x)all(is.na(x)))),]

View(RAO)
remove.list <- paste(c("Myers", "Provider", "NA", "JJ"), collapse = 
'|') 
RAO %>% filter(!grepl(remove.list, RAO$ProviderName)) 
RAO %>% filter(!str_detect(RAO$ProviderName, remove.list))

我想摆脱那些我放在那里的特定字符串。

标签: rrstudio

解决方案


library(dplyr)

# simulate some data
set.seed(12345)
RAO <- data.frame(A = sample(c("Myers", "Provider", "NA", "JJ", "Stack","Overflow"), 50, replace=T),
              B = rnorm(50) )
head(RAO)
#          A          B
# 1    Stack  1.8050975
# 2 Overflow -0.4816474
# 3    Stack  0.6203798
# 4 Overflow  0.6121235
# 5       NA -0.1623110
# 6    Myers  0.8118732

# Remove rows where column A is one of Myers,Provider or NA
RAO %>% filter( !grepl ("Myers|Provider|NA", A))
#           A           B
# 1     Stack  1.80509752
# 2  Overflow -0.48164736
# 3     Stack  0.62037980
# 4  Overflow  0.61212349
# 5        JJ  2.04919034
# 6     Stack  1.63244564

或者,如果 A 列中的值包含多个单词,并且您想要删除那些值以这 3 个单词之一开头的行,您可以在grepl函数中的正则表达式中添加“^”符号:grepl ("^Myers|^Provider|^NA", A)


推荐阅读