首页 > 解决方案 > 是否可以在 Excel 或 R 中只保留数值并将所有值同时更改为分钟?

问题描述

我正在尝试使用从谷歌表单收集的数据编辑表格。在几列中,有关于以分钟和小时为单位的每周身体活动量的数据。除数值外,单元格中还有代表身体活动分钟或小时的文本。目标是在这些单元格中只有数值而不包含文本,并且只有分钟(即需要转换小时)。

具有相应数据的列示例

我已经尝试使用以下公式来删除对这部分问题很好的文本: =SUBSTITUTE(SUBSTITUTE(J52;"min";"");"h";"")

当我试图嵌套公式并在单元格中仅获取分钟时出现问题,因为有分钟(表示为:min)、小时(表示为:h)和一些前面有 > 符号的值(例如 > 4小时被预先确定为 4:30,即 270 分钟)。

为了在 R 中潜在地解决这个问题,我提供了会话信息以及一个可重现的示例:R 版本 4.0.4 (2021-02-15) 平台:x86_64-w64-mingw32/x64 (64-bit) 运行于:Windows 10 x64(构建 19042)

矩阵产品:默认

语言环境: 1 LC_COLLATE=Croatian_Croatia.1250 LC_CTYPE=Croatian_Croatia.1250 LC_MONETARY=Croatian_Croatia.1250 [4] LC_NUMERIC=C LC_TIME=Croatian_Croatia.1250
系统代码页:1252

附加的基础包: 1 stats graphics grDevices utils datasets methods base

通过命名空间加载(未附加): 1 rstudioapi_0.13 knitr_1.33 magrittr_2.0.1 tidyselect_1.1.1 munsell_0.5.0 colorspace_2.0-0 [7] R6_2.5.0 rlang_0.4.11 fansi_0.4.2 dplyr_1.0.5 tools_4.0.4 grid_4 .0.4
[13] gtable_0.3.0 xfun_0.22 tinytex_0.31 utf8_1.2.1 cli_2.5.0 DBI_1.1.1
[19] htmltools_0.5.1.1 ellipsis_0.3.1 digest_0.6.27 yaml_2.2.1 assertthat_0.2.1 tibble_3.1.1
[25] 生命周期_1 .0.0 crayon_1.4.1 purrr_0.3.4 ggplot2_3.3.3 vctrs_0.3.7 evaluate_0.14
[31] 胶水_1.4.2 rmarkdown_2.7 compiler_4.0.4 pillar_1.6.0 generics_0.1.0 scales_1.1.1
[37] pkgconfig_2.0.3

data <- structure(list(id = c("100213", "100998", "100494", "100758", 
"100984", "100461", "100374", "140859", "150950", "140433", "200481", 
"200892", "300487", "300319", "400258", "400075", "200712", "120114", 
"500046", "500649", "600614", "700001", "300812", "700986", "800665", 
"700882", "800470", "400090", "900350", "700283", "600455", "800400", 
"100814", "700027", "600612", "30076", "500671", "200052", "600675", 
"800786", "110092", "800193", "800947", "500749", "800665", "400562", 
"150066", "200013", "700419", "400468", "600584", "ID 600366", 
"300379", "300035", "150346", "140860", "900508", "600751", "110825", 
"300848", "200990", "100237", "500137", "400751", "600347", "700289", 
"700000", "150320", "700319", "300643", "400871", "600963", "400295", 
"120120", "200875", "500209", "700380", "600646", "150415", "150415", 
"600640", "120999", "200693", "600626", "700003", "400493", "700849", 
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa = c("0 min", "", "15 min", "0 min", "0 min", "45 min", 
"15 min", "0 min", "30 min", "30 min", "0 min", "0 min", "0 min", 
"0 min", "45 min", "3:00 h", "0 min", "", "15 min", "0 min", 
"1:00 h", "0 min", "45 min", "0 min", "0 min", "30 min", "1:00 h", 
"0 min", "1:00 h", "2:15 h", "30 min", "15 min", "0 min", "0 min", 
"0 min", "0 min", "0 min", "45 min", "1:00 h", "0 min", "1:30 h", 
"1:00 h", "0 min", "45 min", "0 min", "0 min", "0 min", "0 min", 
"15 min", "", "1:00 h", "0 min", "1:00 h", "30 min", "30 min", 
"30 min", "0 min", "30 min", "30 min", "45 min", "0 min", ">4 h", 
"15 min", "0 min", "45 min", "30 min", "30 min", "0 min", "1:30 h", 
"15 min", "45 min", "0 min", "15 min", "0 min", "0 min", "0 min", 
"0 min", "30 min", "2:00 h", "2:00 h", "15 min", "", "0 min", 
"0 min", "0 min", "0 min", "0 min", "0 min", "0 min", "0 min", 
"1:30 h", "15 min", "0 min", "4:00 h")), class = "data.frame", row.names = c(NA, 
-94L))

预期的输出需要如下所示:

data <- structure(list(id = c("100213", "100998", "100494", "100758", 
"100984", "100461", "100374", "140859", "150950", "140433", "200481", 
"200892", "300487", "300319", "400258", "400075", "200712", "120114", 
"500046", "500649", "600614", "700001", "300812", "700986", "800665", 
"700882", "800470", "400090", "900350", "700283", "600455", "800400", 
"100814", "700027", "600612", "30076", "500671", "200052", "600675", 
"800786", "110092", "800193", "800947", "500749", "800665", "400562", 
"150066", "200013", "700419", "400468", "600584", "ID 600366", 
"300379", "300035", "150346", "140860", "900508", "600751", "110825", 
"300848", "200990", "100237", "500137", "400751", "600347", "700289", 
"700000", "150320", "700319", "300643", "400871", "600963", "400295", 
"120120", "200875", "500209", "700380", "600646", "150415", "150415", 
"600640", "120999", "200693", "600626", "700003", "400493", "700849", 
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa_2 = c("0", "", "15", "0", "0", "45", 
"15", "0", "30", "30", "0", "0", "0", 
"0", "45", "180", "0", "", "15", "0", 
"60", "0", "45", "0", "0", "30", "60", 
"0", "60", "135", "30", "15", "0", "0", 
"0", "0", "0", "45", "60", "0", "90", 
"60", "0", "45", "0", "0", "0", "0", 
"15", "", "60", "0", "60", "30", "30", 
"30", "0", "30", "30", "45", "0", "270", 
"15", "0", "45", "30", "30", "0", "90", 
"15", "45", "0", "15", "0", "0", "0", 
"0", "30", "120", "120", "15", "", "0", 
"0", "0", "0", "0", "0", "0", "0", 
"90", "15", "0", "240")), class = "data.frame", row.names = c(NA, -94L))

标签: rexceltimeformulanumeric

解决方案


使用base R是可行的。我把解决方案放在第一位:

# make a copy of your input data
test <- data

# remove the letters from the time string
for (i in 1:nrow(test)){
  test$vpa[[i]] <- gsub("min|h"," ",test$vpa[[i]])
}

# if a string is in the hour unit (strings appearing as "xx:yy"), replace its value with the calculated minutes
for (i in 1:nrow(test)){
  if (grepl(":",test$vpa[[i]]) == TRUE){
    test$vpa[[i]] <- as.numeric(unlist(strsplit(test$vpa[[i]],":"))[1])*60 + as.numeric(unlist(strsplit(test$vpa[[i]],":"))[2])
  }
}

# check results
# The result "test" will look like your expected output
print(test)

我在这里没有使用任何包,但您可能需要一些知识gsubgreplstr_split了解我做了什么。基本上,我的逻辑是围绕strings.


推荐阅读