r - R tidyr::pivot_longer() 多列
问题描述
我有一个包含几列的数据集,如下所示:
我想将数据转换为更长的格式,其名称为:交易价值、头寸、玩家姓名和趋势价值
这是我尝试过的......但代码试图结合趋势和玩家名称列。如何指定那些应该是由相同位置表示的两个单独的列?
Data_trend <-
data %>%
tidyr::pivot_longer(
cols = !`Trade Value`,
names_to = c("Position", "Player Name"),
names_pattern = c("(.*).(.*)"),
)
谢谢你的帮助!
以下是重现数据的方法:
data <- structure(list(`Trade Value` = c(79, 70.5, 68.5, 67.5, 64, 61,
57.5, 57, 56.5, 56, 55, 54, 51.5, 51, 49.5, 48.5, 47, 43, 41,
39.5, 39, 38.5, 38.5, 38, 35, 34.5, 33.5, 32, 30.5, 30), `Running Back.Player Name` = c("Christian McCaffrey",
"Dalvin Cook", "Alvin Kamara", "", "", "Ezekiel Elliott", "Saquon Barkley",
"", "Derrick Henry", "", "", "", "Nick Chubb", "Aaron Jones",
"Jonathan Taylor", "", "", "", "Joe Mixon", "Antonio Gibson",
"Austin Ekeler", "", "", "", "", "", "", "", "Clyde Edwards-Helaire",
"J.K. Dobbins"), `Running Back.Trend` = c(0, 0, 0, NA, NA, 0,
0, NA, 0, NA, NA, NA, 0, 0, 0, NA, NA, NA, 0, 0, 0, NA, NA, NA,
NA, NA, NA, NA, 0, 0), `Wide Receiver.Player Name` = c("", "",
"", "", "", "", "", "Stefon Diggs", "Davante Adams", "", "Calvin Ridley",
"Tyreek Hill", "", "", "", "", "DeAndre Hopkins", "", "", "",
"A.J. Brown", "D.K. Metcalf", "Justin Jefferson", "", "Terry McLaurin",
"", "Keenan Allen", "Allen Robinson II", "Mike Evans", ""), `Wide Receiver.Trend` = c(NA,
NA, NA, NA, NA, NA, NA, 0, 0, NA, 0, 0, NA, NA, NA, NA, 0, NA,
NA, NA, 0, 0, 0, NA, 0, NA, 0, 0, 0, NA), `Tight End.Player Name` = c("",
"", "", "", "", "", "", "", "", "", "", "", "Travis Kelce", "",
"", "", "", "George Kittle", "", "", "", "", "", "Darren Waller",
"", "", "", "", "", ""), `Tight End.Trend` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA, NA,
NA, NA, NA, 0, NA, NA, NA, NA, NA, NA), `Quarterback.Player Name` = c("",
"", "", "Patrick Mahomes II", "Josh Allen", "", "", "", "", "Kyler Murray",
"", "", "", "", "Lamar Jackson", "Dak Prescott", "", "", "",
"", "", "Russell Wilson", "", "", "Justin Herbert", "Aaron Rodgers",
"", "", "", ""), Quarterback.Trend = c(NA, NA, NA, 0, 0, NA,
NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, 0, NA,
NA, 0, 0, NA, NA, NA, NA)), row.names = c(NA, 30L), class = "data.frame")
解决方案
根据亲爱的Chris Ruehlemann先生的建议,这里是这个解决方案的一个稍微精简的版本:
data %>%
pivot_longer(!`Trade Value`, names_to = c("Position", ".value"),
names_pattern = "(\\D+)\\.(\\D+)")
# A tibble: 120 x 4
`Trade Value` Position `Player Name` Trend
<dbl> <chr> <chr> <dbl>
1 79 Running Back "Christian McCaffrey" 0
2 79 Wide Receiver "" NA
3 79 Tight End "" NA
4 79 Quarterback "" NA
5 70.5 Running Back "Dalvin Cook" 0
6 70.5 Wide Receiver "" NA
7 70.5 Tight End "" NA
8 70.5 Quarterback "" NA
9 68.5 Running Back "Alvin Kamara" 0
10 68.5 Wide Receiver "" NA
# ... with 110 more rows
您可以使用以下解决方案。根据您的列名,我使用了以下正则表达式结构:
\\D+
一次或多次匹配任何非数字字符,包括空格,因此第一个捕获组(\\D+)
匹配名称,如Running Back
.(?:\\s\\D+)
这是我使用的一个非捕获组,它具有(?: ...)
指示可选模式的结构,这里它是一个空格\\s
,后跟多个非数字值\\D+
。我把这个非捕获组放在第二个捕获组中,结果(\\D+(?:\\s\\D+)?)
是第一个模式\\D+
是强制性的,其余的是可选的,它匹配两者Player Name
或Trend
例如。
我在参数中定义了两组列,names_to
所以我必须定义两个捕获组,names_pattern
用文字点分隔,.
如下所示:
library(tidyr)
data %>%
pivot_longer(!`Trade Value`, names_to = c("Position", ".value"),
names_pattern = "(\\D+)\\.(\\D+(?:\\s\\D+)?)")
# A tibble: 120 x 4
`Trade Value` Position `Player Name` Trend
<dbl> <chr> <chr> <dbl>
1 79 Running Back "Christian McCaffrey" 0
2 79 Wide Receiver "" NA
3 79 Tight End "" NA
4 79 Quarterback "" NA
5 70.5 Running Back "Dalvin Cook" 0
6 70.5 Wide Receiver "" NA
7 70.5 Tight End "" NA
8 70.5 Quarterback "" NA
9 68.5 Running Back "Alvin Kamara" 0
10 68.5 Wide Receiver "" NA
# ... with 110 more rows
推荐阅读
- python - 将两个 plotly 离线 iplot 附加到一个 iplot
- java - 查询时休眠没有响应
- javascript - 刷新引导选项卡
- javascript - 仅将调度程序连接到组件而不连接状态
- typo3 - 从 Typo3 中的 CategoryCollection 通过 parentID 获取记录
- reactjs - 使用 React Js 的事件
- reactjs - Redux:调度一个不影响状态的动作
- python - 让烧瓶运行荣誉 app.debug
- sql - 在 SQL 中将 GMT 转换为本地时间
- java - ContextConfigLocation java 配置