r - Removing particular rows in a dataframe with pre-defined conditions
问题描述
I have a data frame with columns
shipment_id created_at picked_at packed_at shipped_at
CSDJKH231BN 2019-02-03 2019-02-03
CSDJKH231BN 2019-02-03 2019-02-03 2019-02-04 2019-02-05
CSDJKH2KFJ3 2019-02-01 2019-02-04 2019-02-07
The data base is being uploaded to rServer via google drive which is being constantly being updated.
u1 <- "https://docs.google.com/spreadsheets/d/e/"link""
tc1 <- getURL(u1, ssl.verifypeer=FALSE)
x <- read.csv(textConnection(tc1))
If in the first update the shipment_id CSDJKH231BN was upto picked_at and in second update from google drive we get CSDJKH231BN upto shipped_at. How do i keep only the shipment_id that are upto shipped_at, but i also want to keep the shipment_id like CSDJKH2KFJ3 which are still to be processed and are not shipped yet.
Basically just to delete the duplicate entries but this code is not working for me.
df <- df[!duplicated(df), ]
Any help would be appreciated.
解决方案
I think you just need to specify that you're looking for duplicates in shipment_id
. However, that will just keep the first version which would have nothing in the shipped_at
column. So you might need to sort the column by the shipped_at
and packed_at
columns (in reverse, so that null values are at the bottom). Does this work?
df <- df[order(df[,'shipped_at'],df[,'packed_at'], decreasing=TRUE),]
df <- df[!duplicated(df$shipment_id), ]
推荐阅读
- outlook - AppleScript 将 Outlook 电子邮件移动到文件夹
- c# - WPF - TabControl 上的弹出窗口行为不明确
- javascript - 检查日期字符串是否大于 3 天
- javascript - 能否像在 Javascript 中将函数转换为字符串一样,在 Java 中将方法转换为字符串?
- php - 如何将我的一串数字输入到数组中以获得最频繁的数字?
- python - 解包序列 [字符串、列表] 和集合是否涉及转换为元组?
- java - 如何使用 Gradle 构建我的多项目 Java 软件
- css - 如何在 windows 和 mac 上一致地重叠包含文本元素的 svg
- x86 - 程序正在执行条件跳转,我不知道为什么
- bash - 循环遍历文件的每一行并对字段求和