r - 是否可以在 Excel 或 R 中只保留数值并将所有值同时更改为分钟?
问题描述
我正在尝试使用从谷歌表单收集的数据编辑表格。在几列中,有关于以分钟和小时为单位的每周身体活动量的数据。除数值外,单元格中还有代表身体活动分钟或小时的文本。目标是在这些单元格中只有数值而不包含文本,并且只有分钟(即需要转换小时)。
我已经尝试使用以下公式来删除对这部分问题很好的文本: =SUBSTITUTE(SUBSTITUTE(J52;"min";"");"h";"")
当我试图嵌套公式并在单元格中仅获取分钟时出现问题,因为有分钟(表示为:min)、小时(表示为:h)和一些前面有 > 符号的值(例如 > 4小时被预先确定为 4:30,即 270 分钟)。
为了在 R 中潜在地解决这个问题,我提供了会话信息以及一个可重现的示例:R 版本 4.0.4 (2021-02-15) 平台:x86_64-w64-mingw32/x64 (64-bit) 运行于:Windows 10 x64(构建 19042)
矩阵产品:默认
语言环境:
1 LC_COLLATE=Croatian_Croatia.1250 LC_CTYPE=Croatian_Croatia.1250 LC_MONETARY=Croatian_Croatia.1250 [4] LC_NUMERIC=C LC_TIME=Croatian_Croatia.1250
系统代码页:1252
附加的基础包: 1 stats graphics grDevices utils datasets methods base
通过命名空间加载(未附加):
1 rstudioapi_0.13 knitr_1.33 magrittr_2.0.1 tidyselect_1.1.1 munsell_0.5.0 colorspace_2.0-0 [7] R6_2.5.0 rlang_0.4.11 fansi_0.4.2 dplyr_1.0.5 tools_4.0.4 grid_4 .0.4
[13] gtable_0.3.0 xfun_0.22 tinytex_0.31 utf8_1.2.1 cli_2.5.0 DBI_1.1.1
[19] htmltools_0.5.1.1 ellipsis_0.3.1 digest_0.6.27 yaml_2.2.1 assertthat_0.2.1 tibble_3.1.1
[25] 生命周期_1 .0.0 crayon_1.4.1 purrr_0.3.4 ggplot2_3.3.3 vctrs_0.3.7 evaluate_0.14
[31] 胶水_1.4.2 rmarkdown_2.7 compiler_4.0.4 pillar_1.6.0 generics_0.1.0 scales_1.1.1
[37] pkgconfig_2.0.3
data <- structure(list(id = c("100213", "100998", "100494", "100758",
"100984", "100461", "100374", "140859", "150950", "140433", "200481",
"200892", "300487", "300319", "400258", "400075", "200712", "120114",
"500046", "500649", "600614", "700001", "300812", "700986", "800665",
"700882", "800470", "400090", "900350", "700283", "600455", "800400",
"100814", "700027", "600612", "30076", "500671", "200052", "600675",
"800786", "110092", "800193", "800947", "500749", "800665", "400562",
"150066", "200013", "700419", "400468", "600584", "ID 600366",
"300379", "300035", "150346", "140860", "900508", "600751", "110825",
"300848", "200990", "100237", "500137", "400751", "600347", "700289",
"700000", "150320", "700319", "300643", "400871", "600963", "400295",
"120120", "200875", "500209", "700380", "600646", "150415", "150415",
"600640", "120999", "200693", "600626", "700003", "400493", "700849",
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa = c("0 min", "", "15 min", "0 min", "0 min", "45 min",
"15 min", "0 min", "30 min", "30 min", "0 min", "0 min", "0 min",
"0 min", "45 min", "3:00 h", "0 min", "", "15 min", "0 min",
"1:00 h", "0 min", "45 min", "0 min", "0 min", "30 min", "1:00 h",
"0 min", "1:00 h", "2:15 h", "30 min", "15 min", "0 min", "0 min",
"0 min", "0 min", "0 min", "45 min", "1:00 h", "0 min", "1:30 h",
"1:00 h", "0 min", "45 min", "0 min", "0 min", "0 min", "0 min",
"15 min", "", "1:00 h", "0 min", "1:00 h", "30 min", "30 min",
"30 min", "0 min", "30 min", "30 min", "45 min", "0 min", ">4 h",
"15 min", "0 min", "45 min", "30 min", "30 min", "0 min", "1:30 h",
"15 min", "45 min", "0 min", "15 min", "0 min", "0 min", "0 min",
"0 min", "30 min", "2:00 h", "2:00 h", "15 min", "", "0 min",
"0 min", "0 min", "0 min", "0 min", "0 min", "0 min", "0 min",
"1:30 h", "15 min", "0 min", "4:00 h")), class = "data.frame", row.names = c(NA,
-94L))
预期的输出需要如下所示:
data <- structure(list(id = c("100213", "100998", "100494", "100758",
"100984", "100461", "100374", "140859", "150950", "140433", "200481",
"200892", "300487", "300319", "400258", "400075", "200712", "120114",
"500046", "500649", "600614", "700001", "300812", "700986", "800665",
"700882", "800470", "400090", "900350", "700283", "600455", "800400",
"100814", "700027", "600612", "30076", "500671", "200052", "600675",
"800786", "110092", "800193", "800947", "500749", "800665", "400562",
"150066", "200013", "700419", "400468", "600584", "ID 600366",
"300379", "300035", "150346", "140860", "900508", "600751", "110825",
"300848", "200990", "100237", "500137", "400751", "600347", "700289",
"700000", "150320", "700319", "300643", "400871", "600963", "400295",
"120120", "200875", "500209", "700380", "600646", "150415", "150415",
"600640", "120999", "200693", "600626", "700003", "400493", "700849",
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa_2 = c("0", "", "15", "0", "0", "45",
"15", "0", "30", "30", "0", "0", "0",
"0", "45", "180", "0", "", "15", "0",
"60", "0", "45", "0", "0", "30", "60",
"0", "60", "135", "30", "15", "0", "0",
"0", "0", "0", "45", "60", "0", "90",
"60", "0", "45", "0", "0", "0", "0",
"15", "", "60", "0", "60", "30", "30",
"30", "0", "30", "30", "45", "0", "270",
"15", "0", "45", "30", "30", "0", "90",
"15", "45", "0", "15", "0", "0", "0",
"0", "30", "120", "120", "15", "", "0",
"0", "0", "0", "0", "0", "0", "0",
"90", "15", "0", "240")), class = "data.frame", row.names = c(NA, -94L))
解决方案
使用base R是可行的。我把解决方案放在第一位:
# make a copy of your input data
test <- data
# remove the letters from the time string
for (i in 1:nrow(test)){
test$vpa[[i]] <- gsub("min|h"," ",test$vpa[[i]])
}
# if a string is in the hour unit (strings appearing as "xx:yy"), replace its value with the calculated minutes
for (i in 1:nrow(test)){
if (grepl(":",test$vpa[[i]]) == TRUE){
test$vpa[[i]] <- as.numeric(unlist(strsplit(test$vpa[[i]],":"))[1])*60 + as.numeric(unlist(strsplit(test$vpa[[i]],":"))[2])
}
}
# check results
# The result "test" will look like your expected output
print(test)
我在这里没有使用任何包,但您可能需要一些知识gsub
,grepl
并str_split
了解我做了什么。基本上,我的逻辑是围绕strings
.
推荐阅读
- java - 当用户在 Google Play 上更新我的应用程序时,是否需要重新安排通知?
- php - 下拉菜单中的 if 语句,条件来自现有值
- python - 从 PubSub 读取的数据流在 GCP 上工作,无法在本地运行
- sql - 尝试基于联接查找记录
- ruby-on-rails - 使用 sablon gems 生成 docx 时出现空白页面
- arrays - 为什么对象属性没有更新,React Hook 与对象数组
- python - 如何为位置创建 n 个状态向量并更新它们?
- java - Keytool 不要求设置密钥密码 (keypass)
- python - 返回 32(或 64)位的 python 哈希函数
- spring-boot - @RequestPart 名称/值在发送请求时被忽略