首页 > 解决方案 > R数据争吵不做整个数据集中的功能

问题描述

我正在准备一个数据集以供进一步分析。为此,我执行 timeshift -days(60) 并将结果写入新列。通常不难,但计算机并没有完全做到这一点。几行没有计算。

df$acquisition_time=as.POSIXct(df$acquisition_time, format = "%Y-%m-%d %H:%M:%OS")
df$acqui_timeshift <- df$acquisition_time - days(60)

在这里,我在您看到问题的地方放了一个屏幕截图: 在此处输入图像描述

角落里的“NA”继续向前。数据集是 130 万行。那么,也许计算机能力还不够?没有错误:或警告:来自 R

如果有人可以帮助我解决这个奇怪的问题,那就太好了!

非常感谢!

最好的,克里斯蒂安

    data<- structure(list(ani_id_year = c("982_year2019", "982_year2019", 
"985_yearNA", "996_yearNA"), month = c("02", "02", "05", "05"
), year = c(2020, 2020, 2018, 2018), year_ts = c(2019, 2019, 
NA, NA), acquisition_time = structure(c(1581310879, 1581782462, 
1527120030, 1527120052), class = c("POSIXct", "POSIXt"), tzone = ""), 
    day = c("02-10", "02-15", "05-24", "05-24"), x = c(382992.722829081, 
    384653.805434133, 387585.792076463, 388305.553482353), y = c(5419798.49669287, 
    5420068.44700148, 5411757.45423474, 5401584.90172328), groupid = c(3L, 
    3L, 3L, 2L), name = c(982L, 982L, 985L, 996L), name_echte = c("Tana", 
    "Tana", "Zita", "Berta"), nr = c(1351995L, 1352125L, 
    1370437L, 1278038L)), row.names = c(1256187L, 1256317L, 1281322L, 
1343545L), class = "data.frame")

抱歉,我真的不知道如何在这里添加,我使用 dpud() 来获取代码..

标签: rdatasetlubridatedata-wrangling

解决方案


我无法使用提供的内容重现相同的错误。但是,我可以确认您的计算方式acqui_timeshift有效。(如果您将输入数据过滤到仅导致结果的那些观察结果acqui_timeshiftNA提供过滤后的数据作为可重复的示例,我可以在这里重试。)

使用data当前提供的

# Method 1 - Using lubridate days() ----------------------------------------

df$acquisition_time <- as.POSIXct(df$acquisition_time, format = "%Y-%m-%d %H:%M:%OS")
df$acqui_timeshift <- df$acquisition_time - days(60)

#           ani_id_year   acquisition_time    acqui_timeshift
# 1256187   982_year2019  2020-02-10 05:01:19 2019-12-12 05:01:19
# 1256317   982_year2019  2020-02-15 16:01:02 2019-12-17 16:01:02
# 1281322   985_yearNA    2018-05-24 00:00:30 2018-03-25 00:00:30
# 1343545   996_yearNA    2018-05-24 00:00:52 2018-03-25 00:00:52

# Method 2 - Using base R difftime() ----------------------------------------

df$acquisition_time = as.POSIXct(df$acquisition_time, format = "%Y-%m-%d %H:%M:%OS")
df$acqui_timeshift <- df$acquisition_time - as.difftime(60, unit="days")

#           ani_id_year  acquisition_time    acqui_timeshift
# 1256187   982_year2019 2020-02-10 05:01:19 2019-12-12 05:01:19
# 1256317   982_year2019 2020-02-15 16:01:02 2019-12-17 16:01:02
# 1281322   985_yearNA   2018-05-24 00:00:30 2018-03-25 00:00:30
# 1343545   996_yearNA   2018-05-24 00:00:52 2018-03-25 00:00:52

挑选出您的屏幕截图aquisition_time中产生的内容NA

# Method 1 - Using lubridate days() ----------------------------------------

as.POSIXct("2019-05-30 02:00:38", tz = "UTC", format = "%Y-%m-%d %H:%M:%OS") - days(60)
# "2019-03-31 02:00:38 UTC"

# Method 2 - Using base R difftime() ----------------------------------------

as.POSIXct("2019-05-30 02:00:38", tz = "UTC", format = "%Y-%m-%d %H:%M:%OS") - as.difftime(60, unit="days")
# "2019-03-31 02:00:38 UTC"

推荐阅读