首页 > 解决方案 > 在拆分中擦除空间 - R

问题描述

我有一个数据框,我将 datetime 列按日期和时间(两列)拆分。但是,当我按时间分组时,它会及时给我重复。因此,为了分析它,我在 time 列上使用了 table(),它也给了我重复项。这是它的一个示例:

> table(df$time)
 00:00:00 00:00:00   00:15:00 00:15:00   00:30:00 00:30:00
     2211      1047      2211      1047      2211      1047

如您所见,当我拆分其中一个“唯一”值时,保留了一个" "内部。有没有简单的方法来解决这个问题?

PS:时间列的数据类型是字符。

编辑:添加代码

df$datetime <- as.character.Date(df$datetime)
x <- colsplit(df$datetime, ' ', names =  c('Date','Time'))
df <- cbind(df, x)

标签: rdatetimetimesplit

解决方案


有多种方法。其中之一是使用适当的函数从 Datetime 列中提取日期和时间:

df <- data.frame(datetime = seq(
  from=as.POSIXct("2018-5-15 0:00", tz="UTC"),
  to=as.POSIXct("2018-5-16 24:00", tz="UTC"),
  by="30 min") )

head(df$datetime)
#[1] "2018-05-15 00:00:00 UTC" "2018-05-15 00:30:00 UTC" "2018-05-15 01:00:00 UTC" "2018-05-15 01:30:00 UTC"
#[5] "2018-05-15 02:00:00 UTC" "2018-05-15 02:30:00 UTC"

df$Date <- as.Date(df$datetime)
df$Time <- format(df$datetime,"%H:%M:%S")

    head(df)
#     datetime       Date     Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 01:30:00 2018-05-15 01:30:00
# 5 2018-05-15 02:00:00 2018-05-15 02:00:00
# 6 2018-05-15 02:30:00 2018-05-15 02:30:00


table(df$Time)
#00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 05:00:00 05:30:00 
#3        2        2        2        2        2        2        2        2        2        2        2 
#06:00:00 06:30:00 07:00:00 07:30:00 08:00:00 08:30:00 09:00:00 09:30:00 10:00:00 10:30:00 11:00:00 11:30:00 
#2        2        2        2        2        2        2        2        2        2        2        2 
#12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00 
#2        2        2        2        2        2        2        2        2        2        2        2 
#18:00:00 18:30:00 19:00:00 19:30:00 20:00:00 20:30:00 21:00:00 21:30:00 22:00:00 22:30:00 23:00:00 23:30:00 
#2        2        2        2        2        2        2        2        2        2        2        2 




#If the data were given as character strings and contain extra spaces the above approach will still work
df <- data.frame(datetime=c("2018-05-15 00:00:00","2018-05-15   00:30:00",
                            "2018-05-15  01:00:00", "2018-05-15      02:00:00",
                            "2018-05-15 00:00:00","2018-05-15   00:30:00"), 
                 stringsAsFactors=FALSE)

df$Date <- as.Date(df$datetime)
df$Time <- format(as.POSIXct(df$datetime, tz="UTC"),"%H:%M:%S")
head(df)
#                   datetime       Date     Time
# 1      2018-05-15 00:00:00 2018-05-15 00:00:00
# 2    2018-05-15   00:30:00 2018-05-15 00:30:00
# 3     2018-05-15  01:00:00 2018-05-15 01:00:00
# 4 2018-05-15      02:00:00 2018-05-15 02:00:00
# 5      2018-05-15 00:00:00 2018-05-15 00:00:00
# 6    2018-05-15   00:30:00 2018-05-15 00:30:00

table(df$Time)
#00:00:00 00:30:00 01:00:00 02:00:00 
#       2        2        1        1 

推荐阅读