首页 > 解决方案 > 根据 R 中另一个数据框中的行对数据框中的行进行子集化

问题描述

只是我们正在使用的数据的快照 只是我们正在使用的数据的快照

我想要做的是识别存在超过 90% 的 5 类的块 (BlockId),然后从数据集中删除所有这些块。我开始对数据进行子集化,subset(NLCD2008,Class==5 & Percent< .90)这给了我一个 DF,其中有一列包含应该删除的块,如下所示:

    > dput(ids)
structure(list(BLOCKID = c(100, 131, 179, 200, 222, 236, 238, 
241, 244, 254, 257, 258, 265, 266, 27, 278, 57, 63, 69, 75, 81
), Class = c("5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5"), CA = c(22983987.0806, 
24692082.1724, 23533460.3724, 23401233.5635, 24116398.1926, 23766711.1699, 
24795140.5362, 24876914.4067, 24898552.2795, 24985030.0734, 25012822.6465, 
24993341.0278, 25041230.4987, 25049166.7966, 22372955.0846, 24737206.1697, 
24104160.9584, 24922870.2331, 24943920.0281, 24162534.823, 23096329.0313
), TLA = c(25018769.0617, 25057087.1604, 25149935.9177, 25176830.9298, 
25207224.138, 24802986.7321, 24852905.0566, 24883383.5601, 24898641.1381, 
24985030.0734, 25012822.6465, 25049866.3254, 25090169.5911, 25072609.4832, 
24830593.7725, 25144460.7117, 24935516.21, 24930068.7064, 24947519.2647, 
24961803.5077, 24974601.3436), MSI = c(1.69665962298056, 1.31048429936865, 
1.33110171648693, 1.36242160001161, 1.27666751812728, 1.22789953816493, 
1.26867391259833, 1.25128851571841, 1.18533526393745, 1.18792224187668, 
1.18520978795299, 1.39406482047182, 1.24884906769663, 1.24939571303602, 
1.31731564029142, 1.59900472213938, 1.38890295951441, 1.20315890311899, 
1.18325402703837, 1.27998393051198, 1.47485350719615), Percent = c(0.918669780432366, 
0.985433063880751, 0.935726454707888, 0.929474945784445, 0.956725661682217, 
0.958219726785611, 0.997675743730222, 0.99974002115169, 0.999996431186766, 
1, 1, 0.997743489052367, 0.998049471438513, 0.999065008107126, 
0.901023764859709, 0.983803409161585, 0.96665979382185, 0.999711253370988, 
0.999855727675293, 0.967980331050461, 0.92479270093409)), row.names = c(NA, 
-21L), class = c("tbl_df", "tbl", "data.frame"))

我想从这里做的是从这个子集中获取 21 个唯一的块 ID,并将它们从原始数据中删除。所以这个子集将块 27,57,63.... 识别为不合适的块,我希望能够获取该列表并将它们从原始数据中删除。

标签: r

解决方案


你可以试试这个:

NLCD2008[ !(with(NLCD2008, Class==5 & Percent > .90)), ]

使用subset()

# remove all blocks that contain greater than 90% of class 5 from NLCD2008 dataset.
subset(NLCD2008, !(Class==5 & Percent > .90))

# get filtered block ids   
ids <- subset(NLCD2008, Class == 5 & Percent > 0.9)
# remove the block ids from original data.
NLCD2008[!(NLCD2008$BLOCKID %in% unique(ids$BLOCKID)), ]

推荐阅读