r - 根据 1 个列表将列表列表转换为数据框 - 每个列表都不同
问题描述
真的很难将列表列表转换为整洁的数据框。我找到了各种解决方案,但没有一个适用于我特别想做的事情。它还必须非常快,因为数据集非常庞大。这是列表的片段:
$PMID
[1] "32007943" "32007942" "32007941" "31894091" "31894090"
$Authors
$Authors[[1]]
LastName ForeName Initials order
1 Ward Jordan M JM 1
2 Hess Jaclyn N JN 2
3 Davis Loretta S LS 3
$Authors[[2]]
LastName ForeName Initials order
1 Pope Janet E JE 1
$Authors[[3]]
LastName ForeName Initials order
1 Polachek Ari A 1
2 Eder Lihi L 2
$Authors[[4]]
LastName ForeName Initials order
1 Milchert Marcin M 1
2 Brzosko Marek M 2
$Authors[[5]]
LastName ForeName Initials order
1 Pascual Eliseo E 1
2 Andrés Mariano M 2
3 Sivera Francisca F 3
$Year
[1] 2020 2020 2020 2020 2020
$PublicationType
$PublicationType[[1]]
PublicationType
"Journal Article"
$PublicationType[[2]]
PublicationType
"Editorial"
$PublicationType[[3]]
PublicationType
"Editorial"
$PublicationType[[4]]
PublicationType
"Journal Article"
$PublicationType[[5]]
PublicationType
"Editorial"
这是我希望这些数据最终看起来的样子。“作者”应该是数据框中的观察结果,但它需要保留结构,其中组 Authors[1] 匹配到 PMID[1] 和 Authors[2] 到 PMID[2] 等等。
LastName ForeName Initials order PMID Year Publication_Type
1 Ward Jordan M JM 1 32007943 2020 "Journal Article"
2 Hess Jaclyn N JN 2 32007943 2020 "Journal Article"
3 Davis Loretta S LS 3 32007943 2020 "Journal Article"
4 Pope Janet E JE 1 32007942 2020 "Editorial"
5 Polachek Ari A 1 32007941 2020 "Editorial"
6 Eder Lihi L 2 32007941 2020 "Editorial"
7 Milchert Marcin M 1 31894091 2020 "Journal Article"
8 Brzosko Marek M 2 31894091 2020 "Journal Article"
9 Pascual Eliseo E 1 31894090 2020 "Editorial"
1 Andrés Mariano M 2 31894090 2020 "Editorial"
11 Sivera Francisca F 3 31894090 2020 "Editorial"
非常感谢任何帮助!
更新更新:A. Suliman 发布了一个非常好的解决方案,但是一旦我进行了 100 多次提取,它就会崩溃。非常混乱。我只是发布整个脚本和输出:
#install.packages("RISmed")
library(RISmed)
srch_jrheum2020 <- '("The Journal of rheumatology"[Journal]) AND ("2019/01/01"[Date - Publication] : "2020/01/01"[Date - Publication])'
query_jrheum2020 <- EUtilsSummary(
srch_jrheum2020,
retmax=100,
mindate= 2000,
maxdate= 2021,
datetype = "ppdt")
package_jrheum2020 <- EUtilsGet(query_jrheum2020, type = "efetch", db = "pubmed")
list_jrheum2020 <- list('PMID' = PMID(package_jrheum2020),
'Authors' = Author(package_jrheum2020),
'Year' = YearPubmed(package_jrheum2020),
'Month' = MonthPubmed(package_jrheum2020),
'Day' = DayPubmed(package_jrheum2020),
'Journal' = Title(package_jrheum2020),
'PublicationType' = PublicationType(package_jrheum2020))
df_jrheum <- pmap_dfr(list_jrheum2020, ~data.frame(
.y,
pmid = .x,
year = ..3,
month = ..4,
day = ..5,
journal = ..6,
type = ..7,
stringsAsFactors = FALSE))
这样效果很好,并且完全返回了我想要的:
LastName ForeName Initials order pmid year month day journal type
1 Milchert Marcin M 1 31894091 2020 1 3 The Journal of rheumatology Journal Article
2 Brzosko Marek M 2 31894091 2020 1 3 The Journal of rheumatology Journal Article
3 Pascual Eliseo E 1 31894090 2020 1 3 The Journal of rheumatology Editorial
4 Andrés Mariano M 2 31894090 2020 1 3 The Journal of rheumatology Editorial
5 Sivera Francisca F 3 31894090 2020 1 3 The Journal of rheumatology Editorial
6 Yazici Yusuf Y 1 31894089 2020 1 3 The Journal of rheumatology Editorial
7 Mankia Kulveer K 1 31787610 2019 12 4 The Journal of rheumatology Letter
8 Briggs Christopher C 2 31787610 2019 12 4 The Journal of rheumatology Letter
9 Emery Paul P 3 31787610 2019 12 4 The Journal of rheumatology Letter
10 Deane Kevin D KD 1 31787609 2019 12 4 The Journal of rheumatology Journal Article
11 Demoruelle M Kristen MK 2 31787609 2019 12 4 The Journal of rheumatology Journal Article
12 Alpizar-Rodriguez Deshiré D 1 31787603 2019 12 4 The Journal of rheumatology Letter
13 Finckh Axel A 2 31787603 2019 12 4 The Journal of rheumatology Letter
14 Dahal Lekh N LN 1 31787598 2019 12 4 The Journal of rheumatology Letter
15 Barker Robert N RN 2 31787598 2019 12 4 The Journal of rheumatology Letter
16 Ward Frank J FJ 3 31787598 2019 12 4 The Journal of rheumatology Letter
17 <NA> <NA> <NA> NA 31787596 2019 12 4 The Journal of rheumatology Journal Article
18 <NA> <NA> <NA> NA 31787595 2019 12 4 The Journal of rheumatology Journal Article
19 <NA> <NA> <NA> NA 31787595 2019 12 4 The Journal of rheumatology Published Erratum
20 Rabin Jeff C JC 1 31787594 2019 12 4 The Journal of rheumatology Journal Article
21 Ramirez Kirsti K 2 31787594 2019 12 4 The Journal of rheumatology Journal Article
22 Owen Claire E CE 1 31787593 2019 12 4 The Journal of rheumatology Editorial
23 Liew David F L DFL 2 31787593 2019 12 4 The Journal of rheumatology Editorial
24 Buchanan Russell R C RRC 3 31787593 2019 12 4 The Journal of rheumatology Editorial
25 Falasinnu Titilola T 1 31787592 2019 12 4 The Journal of rheumatology Editorial
26 Simard Julia F JF 2 31787592 2019 12 4 The Journal of rheumatology Editorial
27 Hwang Steven R SR 1 31676702 2019 11 5 The Journal of rheumatology Letter
28 Sawatsky Adam P AP 2 31676702 2019 11 5 The Journal of rheumatology Letter
29 Michelena Xabier X 1 31676699 2019 11 5 The Journal of rheumatology Letter
30 Marco-Pascual Carla C 2 31676699 2019 11 5 The Journal of rheumatology Letter
31 González-Giménez Xavier X 3 31676699 2019 11 5 The Journal of rheumatology Letter
32 Juanola Xavier X 4 31676699 2019 11 5 The Journal of rheumatology Letter
出于某种疯狂的原因,如果我将 EUtilsSummary 函数的“retmax”部分超过 100,我会收到此错误:
> df_jrheum <- pmap_dfr(list_jrheum2020, ~data.frame(
+ .y,
+ pmid = .x,
+ year = ..3,
+ month = ..4,
+ day = ..5,
+ journal = ..6,
+ type = ..7,
+ stringsAsFactors = FALSE))
Error in data.frame(.y, pmid = .x, year = ..3, month = ..4, day = ..5, :
arguments imply differing number of rows: 4, 1, 3
In addition: There were 50 or more warnings (use warnings() to see the first 50)
>
> warnings()
Warning messages:
1: In data.frame(.y, pmid = .x, year = ..3, month = ..4, ... :
row names were found from a short variable and have been discarded
2: In data.frame(.y, pmid = .x, year = ..3, month = ..4, ... :
row names were found from a short variable and have been discarded
3: In data.frame(.y, pmid = .x, year = ..3, month = ..4, ... :
row names were found from a short variable and have been discarded
还有关于删除行的烦人警告,但这似乎不太重要。
解决方案
鉴于列表元素的长度相同,我们可以使用purrr::pmap
并行循环然后使用绑定元素data.frame
purrr::pmap_dfr(lst_tmp, ~data.frame(.y, PMID=.x, year=..3, PublicationType=..4,
stringsAsFactors = FALSE))
mpg cyl disp hp drat wt qsec vs am gear carb PMID year PublicationType
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 2020 Journal Article
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1 2020 Journal Article
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1 2020 Journal Article
4 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 2 2021 Journal Ed
5 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 2 2021 Journal Ed
6 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 3 2022 Journal
7 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 3 2022 Journal
8 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 3 2022 Journal
数据
lst_tmp <- list(PMID=c(1,2,3), Authors=list(head(mtcars, 3), tail(mtcars, 2), mtcars[10:12, ]), year=c(2020,2021,2022),
PublicationType=list(c(PublicationType="Journal Article"), c(PublicationType="Journal Ed"), c(PublicationType="Journal")))
更新:
#With retmax=200 we got some PublicationType with more than one type, e.g list_jrheum2020$PublicationType[[9]], so we can do
paste(list_jrheum2020$PublicationType[[9]], collapse = "-")
#or if you are interset in the first element
list_jrheum2020$PublicationType[[9]][[1]]
#therefore we can try
pmap_dfr(list_jrheum2020, ~data.frame(
.y,
pmid = .x,
year = ..3,
month = ..4,
day = ..5,
journal = ..6,
type = paste(..7, collapse = "-"),
stringsAsFactors = FALSE))
推荐阅读
- javascript - 从 Google 表格的多个选项卡中提取一行数据并显示在一个主表格中
- html - CSS 溢出隐形导航 - 防止溢出
- vim - 在VIM中添加一个字符到单词的开头和结尾
- swift - 将自定义属性包装器与 @Published 相结合
- mysql - 如何在laravel eloquent的whereMonth方法中传递数组
- python - root.withdraw 不适用于 if 语句?
- vue.js - Vue Bootstrap b-table 懒加载数据
- python - 将 Python 代码更改为 Node.js 代码(base64 / hmac256 编码)
- javascript - 发送后如何删除消息?
- css - 需要帮助重新调整 CSS 滑动文本动画的用途