首页 > 解决方案 > 带有 For 循环的子集列表

问题描述

我必须对这个“plt”列表进行子集化。“Plt”是 GPS 点列表,带有日期和时间。“标签”是一天中所有行程的列表,包括开始时间和结束时间。

我将取第 1 行中的点和第 1 行中labels$Start的点,在列labels$End中搜索这些值,plt$Data_Time然后对 Start 值和 End 值之间的所有行进行子集化。

> str(labels)
'data.frame':	10 obs. of  8 variables:
 $ Date_ST: Factor w/ 5 levels "2008/04/28","2008/04/29",..: 1 1 2 2 3 3 4 4 5 5
 $ Time_ST: Factor w/ 15 levels "01:27:05","01:33:29",..: 13 15 4 10 1 7 8 12 2 11
 $ Date_ET: Factor w/ 5 levels "2008/04/28","2008/04/29",..: 1 1 2 2 3 3 4 4 5 5
 $ Time_ET: Factor w/ 15 levels "01:35:25","01:41:11",..: 13 15 3 10 1 5 6 12 2 9
 $ Mode   : Factor w/ 2 levels "subway","walk": 2 2 2 2 2 2 2 2 2 2
 $ ID     : int  1 3 4 6 7 9 10 12 13 15
 $ Start  : chr  "2008/04/28 11:27:42" "2008/04/28 11:42:56" "2008/04/29 01:38:21" "2008/04/29 01:57:55" ...
 $ End    : chr  "2008/04/28 11:27:58" "2008/04/28 11:50:10" "2008/04/29 01:41:28" "2008/04/29 02:03:28" ...
 
 > str(plt)
'data.frame':	4377 obs. of  9 variables:
 $ Lat      : num  40.1 40.1 40.1 40.1 40.1 ...
 $ Long     : num  116 116 116 116 116 ...
 $ X0       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Alt      : int  492 492 491 491 491 490 490 490 489 489 ...
 $ n.days   : num  39589 39589 39589 39589 39589 ...
 $ Date     : Factor w/ 5 levels "2008-05-21","2008-04-28",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Time     : Factor w/ 2955 levels "01:33:29","01:33:30",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ ID       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data_Time: chr  "2008-05-21 01:33:29" "2008-05-21 01:33:30" "2008-05-21 01:33:31" "2008-05-21 01:33:33" ...

head(plt)
       Lat     Long X0 Alt   n.days       Date     Time ID           Data_Time
1 40.07045 116.3130  0 492 39589.06 2008-05-21 01:33:29  1 2008-05-21 01:33:29
2 40.07045 116.3133  0 492 39589.06 2008-05-21 01:33:30  2 2008-05-21 01:33:30
3 40.07050 116.3131  0 491 39589.06 2008-05-21 01:33:31  3 2008-05-21 01:33:31
4 40.07052 116.3130  0 491 39589.06 2008-05-21 01:33:33  4 2008-05-21 01:33:33
5 40.07050 116.3129  0 491 39589.06 2008-05-21 01:33:35  5 2008-05-21 01:33:35
6 40.07047 116.3129  0 490 39589.07 2008-05-21 01:33:37  6 2008-05-21 01:33:37

labels
      Date_ST  Time_ST    Date_ET  Time_ET Mode ID               Start                 End
1  2008/04/28 11:27:42 2008/04/28 11:27:58 walk  1 2008/04/28 11:27:42 2008/04/28 11:27:58
3  2008/04/28 11:42:56 2008/04/28 11:50:10 walk  3 2008/04/28 11:42:56 2008/04/28 11:50:10
4  2008/04/29 01:38:21 2008/04/29 01:41:28 walk  4 2008/04/29 01:38:21 2008/04/29 01:41:28
6  2008/04/29 01:57:55 2008/04/29 02:03:28 walk  6 2008/04/29 01:57:55 2008/04/29 02:03:28
7  2008/05/12 01:27:05 2008/05/12 01:35:25 walk  7 2008/05/12 01:27:05 2008/05/12 01:35:25
9  2008/05/12 01:51:11 2008/05/12 01:55:35 walk  9 2008/05/12 01:51:11 2008/05/12 01:55:35
我需要为每一行做这件事,所以我考虑过使用 for 循环。最后,我只想保留第 1 列和第 2 列(纬度和经度)。
for(i in 1:nrow(labels)) {
  a = labels$Start[i] #prendo coord inizio/fine percorso
  b = labels$End[i] 
  
  k = plt[plt$Data_Time >= a & plt$Data_Time < b, ]
  LatLong = k[1:2]
  head(LatLong)
  write.table(LatLong, "~/Desktop/LatLongTrip.txt", sep="\t") 

不幸的是,结果是:

> k = plt[plt$Data_Time >= b & plt$Data_Time < a, ]
> k
[1] Lat       Long      X0        Alt       n.days    Date      Time      ID        Data_Time
<0 rows> (or 0-length row.names)
实际上,这两个值之间有一些行,你能帮帮我吗?

标签: rloopsfor-loopsubset

解决方案


您不需要 for 循环 :) 这里:

首先确保有库 sqldf

然后,设置一个模拟数据示例:

fechasInicioYFin <- data.frame(
  fechasInicio = as.POSIXct(c('2016-08-19 10:00','2016-08-25 15:00','2016-09-15 15:00','2016-07-20 11:00')),
  fechasFin = as.POSIXct(c('2016-08-19 14:00','2016-08-25 18:00','2016-09-15 19:00','2016-07-20 16:00'))

  )

dataConFecha <- data.frame(num1 = c(1,2,3,4,5,6), num2 = c(11:16), 
                           fechas =  as.POSIXct(c('2016-08-19 12:00','2016-08-25 16:00','2016-09-15 16:00','2016-07-20 13:00',
                                                  '2016-08-19 13:00','2016-09-15 17:00'))
                             )

现在只需按日期列加入它们并仅选择您感兴趣的列:

sqldf("select a.*,b.fechasInicio,b.fechasFin from dataConFecha as a join fechasInicioYFin as b on
  a.fechas between b.fechasInicio and b.fechasFin")

**使用“between” sql 语句而不是 >= 和 <=,正如 @G 所建议的那样。格洛腾迪克

输出应该是这样的:sqlJoin 结果

如您所见,数据现在基本上按开始日期和结束日期分组。


推荐阅读