r - For循环改变多列
问题描述
我有一个songs
太大的小标题,无法在这里分享。另外,没关系;该问题适用于任何只有dbl
值的小标题。
这个想法是我之前选择了一行。它可以是其中任何一个,而无需任何先前的知识。我做的第一件事就是过滤掉它:
songs2 <- songs %>%
anti_join(choice)
这行得通。
顺便说一句,choice
有单排。
现在,我创建了一个名为 的第二个tibble
(第三个,但在这篇文章中是第二个) dist
,它只有dbl
值(因此与 共享列choice
)。我想从 中choice
的每一行中减去 中的值dist
。
我试着写这个:
for (i in seq_along(distUseful)) {
dist <- dist %>%
mutate_(distUseful[i] = (.data[[i]] - choice[[i]]))
}
但它不起作用:
> for (i in seq_along(distUseful)) {
+ dist <- dist %>%
+ mutate_(distUseful[i] = (.data[[i]] - choice[[i]]))
Error: unexpected '=' in:
" dist <- dist %>%
mutate_(distUseful[i] ="
> }
Error: unexpected '}' in "}"
编辑:这是评论中要求的前 10 行songs2
。
structure(list(acousticness = c(0.991, 0.643, 0.993, 0.000173,
0.295, 0.996, 0.992, 0.996, 0.996, 0.00682), artists = c("['Mamie Smith']",
"[\"Screamin' Jay Hawkins\"]", "['Mamie Smith']", "['Oscar Velazquez']",
"['Mixe']", "['Mamie Smith & Her Jazz Hounds']", "['Mamie Smith']",
"['Mamie Smith & Her Jazz Hounds']", "['Francisco Canaro']",
"['Meetya']"), danceability = c(0.598, 0.852, 0.647, 0.73, 0.704,
0.424, 0.782, 0.474, 0.469, 0.571), duration_ms = c(168333, 150200,
163827, 422087, 165224, 198627, 195200, 186173, 146840, 476304
), energy = c(0.224, 0.517, 0.186, 0.798, 0.707, 0.245, 0.0573,
0.239, 0.238, 0.753), explicit = c(FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE), id = c("0cS0A1fUEUd1EW3FcF8AEI",
"0hbkKFIJm7Z05H8Zl9w30f", "11m7laMUgmOKqI3oYzuhne", "19Lc5SfJJ5O1oaxY0fpwfh",
"2hJjbsLCytGsnAHfdsLejp", "3HnrHGLE9u2MjHtdobfWl9", "5DlCyqLyX2AOVDTjjkDZ8x",
"02FzJbHtqElixxCmrpSCUa", "02i59gYdjlhBmbbWhf8YuK", "06NUxS2XL3efRh0bloxkHm"
), instrumentalness = c(0.000522, 0.0264, 1.76e-05, 0.801, 0.000246,
0.799, 1.61e-06, 0.186, 0.96, 0.873), key = c(5, 5, 0, 2, 10,
5, 5, 9, 8, 8), liveness = c(0.379, 0.0809, 0.519, 0.128, 0.402,
0.235, 0.176, 0.195, 0.149, 0.092), loudness = c(-12.628, -7.261,
-12.098, -7.311, -6.036, -11.47, -12.453, -9.712, -18.717, -6.943
), mode = c(0, 0, 1, 1, 0, 1, 1, 1, 1, 1), name = c("Keep A Song In Your Soul",
"I Put A Spell On You", "Golfing Papa", "True House Music - Xavier Santos & Carlos Gomix Remix",
"Xuniverxe", "Crazy Blues - 78rpm Version", "Don't You Advertise Your Man",
"Arkansas Blues", "La Chacarera - Remasterizado", "Broken Puppet - Original Mix"
), popularity = c(12, 7, 4, 17, 2, 9, 5, 0, 0, 0), release_date = c("1920",
"1920-01-05", "1920", "1920-01-01", "1920-10-01", "1920", "1920",
"1920", "1920-07-08", "1920-01-01"), speechiness = c(0.0936,
0.0534, 0.174, 0.0425, 0.0768, 0.0397, 0.0592, 0.0289, 0.0741,
0.0446), tempo = c(149.976, 86.889, 97.6, 127.997, 122.076, 103.87,
85.652, 78.784, 130.06, 126.993), valence = c(0.634, 0.95, 0.689,
0.0422, 0.299, 0.477, 0.487, 0.366, 0.621, 0.119), year = c(1920,
1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
这是choice
:
structure(list(acousticness = 0.511, danceability = 0.403, duration_ms = 117395,
instrumentalness = 0.896, liveness = 0.108, loudness = -8.126,
popularity = 65, speechiness = 0.0514, tempo = 135.047, valence = 0.192), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"))
最后:
distUseful <- c("acousticness", "danceability", "duration_ms", "duration_ms", "instrumentalness", "liveness", "loudness", "popularity", "speechiness", "tempo", "valence")
编辑 2:只是事后的想法:如果您采用我之前引用的循环并查看它在单次迭代中的工作方式(您选择变量),它就可以工作。事实上,问题在于第一部分,distUseful[i] =
根据错误消息和代码。
编辑3:例如,如果仅对第一列执行此操作,则会发生以下情况(因此第一个是正确的,其余的没有改变):
> dist %>%
+ mutate(acousticness = (dist[[1]] - choice[[1]]))
# A tibble: 174,388 x 10
acousticness danceability duration_ms instrumentalness liveness loudness popularity speechiness tempo valence
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.48 0.598 168333 0.000522 0.379 -12.6 12 0.0936 150. 0.634
2 0.132 0.852 150200 0.0264 0.0809 -7.26 7 0.0534 86.9 0.95
3 0.482 0.647 163827 0.0000176 0.519 -12.1 4 0.174 97.6 0.689
4 -0.511 0.73 422087 0.801 0.128 -7.31 17 0.0425 128. 0.0422
5 -0.216 0.704 165224 0.000246 0.402 -6.04 2 0.0768 122. 0.299
6 0.485 0.424 198627 0.799 0.235 -11.5 9 0.0397 104. 0.477
7 0.481 0.782 195200 0.00000161 0.176 -12.5 5 0.0592 85.7 0.487
8 0.485 0.474 186173 0.186 0.195 -9.71 0 0.0289 78.8 0.366
9 0.485 0.469 146840 0.96 0.149 -18.7 0 0.0741 130. 0.621
10 -0.504 0.571 476304 0.873 0.092 -6.94 0 0.0446 127. 0.119
解决方案
假设这dist
是一个 tibble 并且choice
是一个值向量(其长度等于 中的列数dist
),我会尝试这样的事情:
amend_row <- function(amend_vals, ...) {
... - amend_vals
}
purrr::pmap(dist, ~ amend_row(amend_vals = choice, .)) %>%
do.call(what = rbind, args = .) %>%
as_tibble() %>%
purrr::set_names(nm = colnames(dist))
推荐阅读
- r - 我想在我的数据框中添加一个日期列。日期列需要自动填充列的全长
- vb.net - 运行此代码时为什么我的文本框中没有显示任何内容的任何原因?
- python - 获取具有两个特定关键字的行
- java - How JVM manages memory for methods?
- hyperledger-fabric - 为什么我在运行 Hyperledger Explorer 后看到双行
- cloud9-ide - 寻找基于云的 ide/ide,我可以在其中设置 apache-superset 进行开发
- sql - 如何查询所有仅具有最高值的不同行?
- ios - 列的子项不得包含任何空值,但在索引 0 处找到空值
- excel - 删除 excel 中为 3 位数字生成的组合中的重复项
- r - 如何避免在 R 中的用户定义函数上使用 sapply()