首页 > 解决方案 > R tidyr::pivot_longer() 多列

问题描述

我有一个包含几列的数据集,如下所示:

在此处输入图像描述

我想将数据转换为更长的格式,其名称为:交易价值、头寸、玩家姓名和趋势价值

这是我尝试过的......但代码试图结合趋势和玩家名称列。如何指定那些应该是由相同位置表示的两个单独的列?

  Data_trend <-
    data %>%
    tidyr::pivot_longer(
      cols = !`Trade Value`,
      names_to = c("Position", "Player Name"),
      names_pattern = c("(.*).(.*)"),
    )

谢谢你的帮助!

以下是重现数据的方法:

    data <- structure(list(`Trade Value` = c(79, 70.5, 68.5, 67.5, 64, 61, 
57.5, 57, 56.5, 56, 55, 54, 51.5, 51, 49.5, 48.5, 47, 43, 41, 
39.5, 39, 38.5, 38.5, 38, 35, 34.5, 33.5, 32, 30.5, 30), `Running Back.Player Name` = c("Christian McCaffrey", 
"Dalvin Cook", "Alvin Kamara", "", "", "Ezekiel Elliott", "Saquon Barkley", 
"", "Derrick Henry", "", "", "", "Nick Chubb", "Aaron Jones", 
"Jonathan Taylor", "", "", "", "Joe Mixon", "Antonio Gibson", 
"Austin Ekeler", "", "", "", "", "", "", "", "Clyde Edwards-Helaire", 
"J.K. Dobbins"), `Running Back.Trend` = c(0, 0, 0, NA, NA, 0, 
0, NA, 0, NA, NA, NA, 0, 0, 0, NA, NA, NA, 0, 0, 0, NA, NA, NA, 
NA, NA, NA, NA, 0, 0), `Wide Receiver.Player Name` = c("", "", 
"", "", "", "", "", "Stefon Diggs", "Davante Adams", "", "Calvin Ridley", 
"Tyreek Hill", "", "", "", "", "DeAndre Hopkins", "", "", "", 
"A.J. Brown", "D.K. Metcalf", "Justin Jefferson", "", "Terry McLaurin", 
"", "Keenan Allen", "Allen Robinson II", "Mike Evans", ""), `Wide Receiver.Trend` = c(NA, 
NA, NA, NA, NA, NA, NA, 0, 0, NA, 0, 0, NA, NA, NA, NA, 0, NA, 
NA, NA, 0, 0, 0, NA, 0, NA, 0, 0, 0, NA), `Tight End.Player Name` = c("", 
"", "", "", "", "", "", "", "", "", "", "", "Travis Kelce", "", 
"", "", "", "George Kittle", "", "", "", "", "", "Darren Waller", 
"", "", "", "", "", ""), `Tight End.Trend` = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA, NA, 
NA, NA, NA, 0, NA, NA, NA, NA, NA, NA), `Quarterback.Player Name` = c("", 
"", "", "Patrick Mahomes II", "Josh Allen", "", "", "", "", "Kyler Murray", 
"", "", "", "", "Lamar Jackson", "Dak Prescott", "", "", "", 
"", "", "Russell Wilson", "", "", "Justin Herbert", "Aaron Rodgers", 
"", "", "", ""), Quarterback.Trend = c(NA, NA, NA, 0, 0, NA, 
NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, 0, NA, 
NA, 0, 0, NA, NA, NA, NA)), row.names = c(NA, 30L), class = "data.frame")

标签: r

解决方案


根据亲爱的Chris Ruehlemann先生的建议,这里是这个解决方案的一个稍微精简的版本:

data %>%
  pivot_longer(!`Trade Value`, names_to = c("Position", ".value"), 
               names_pattern = "(\\D+)\\.(\\D+)")

# A tibble: 120 x 4
   `Trade Value` Position      `Player Name`         Trend
           <dbl> <chr>         <chr>                 <dbl>
 1          79   Running Back  "Christian McCaffrey"     0
 2          79   Wide Receiver ""                       NA
 3          79   Tight End     ""                       NA
 4          79   Quarterback   ""                       NA
 5          70.5 Running Back  "Dalvin Cook"             0
 6          70.5 Wide Receiver ""                       NA
 7          70.5 Tight End     ""                       NA
 8          70.5 Quarterback   ""                       NA
 9          68.5 Running Back  "Alvin Kamara"            0
10          68.5 Wide Receiver ""                       NA
# ... with 110 more rows

您可以使用以下解决方案。根据您的列名,我使用了以下正则表达式结构:

  • \\D+一次或多次匹配任何非数字字符,包括空格,因此第一个捕获组(\\D+)匹配名称,如Running Back.
  • (?:\\s\\D+)这是我使用的一个非捕获组,它具有(?: ...)指示可选模式的结构,这里它是一个空格\\s,后跟多个非数字值\\D+。我把这个非捕获组放在第二个捕获组中,结果(\\D+(?:\\s\\D+)?)是第一个模式\\D+是强制性的,其余的是可选的,它匹配两者Player NameTrend例如。

我在参数中定义了两组列,names_to所以我必须定义两个捕获组,names_pattern用文字点分隔,.如下所示:

library(tidyr)

data %>%
  pivot_longer(!`Trade Value`, names_to = c("Position", ".value"), 
               names_pattern = "(\\D+)\\.(\\D+(?:\\s\\D+)?)")

# A tibble: 120 x 4
   `Trade Value` Position      `Player Name`         Trend
           <dbl> <chr>         <chr>                 <dbl>
 1          79   Running Back  "Christian McCaffrey"     0
 2          79   Wide Receiver ""                       NA
 3          79   Tight End     ""                       NA
 4          79   Quarterback   ""                       NA
 5          70.5 Running Back  "Dalvin Cook"             0
 6          70.5 Wide Receiver ""                       NA
 7          70.5 Tight End     ""                       NA
 8          70.5 Quarterback   ""                       NA
 9          68.5 Running Back  "Alvin Kamara"            0
10          68.5 Wide Receiver ""                       NA
# ... with 110 more rows

推荐阅读