html - 从网页中检索标题和 ID 数据并保存在 excel 文件中
问题描述
我正在寻找从网页中检索标题和 PMID (PubMed ID) 记录并将其保存在 MS excel 文件中。我尝试使用easyPubMed
R 中的库进行提取,但是,我无法获得相同的内容。是否有任何库或包获得这个。请帮我解决这个问题。
下面提供了输入数据和预期输出数据的示例:
代码:
library(easyPubMed)
my_query <- '"ACSL1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 2" [ti] OR "Long-Chain Fatty-Acid-Coenzyme A Ligase 1" [ti] OR "Long-Chain-Fatty-Acid–CoA Ligase 1" [ti] OR "Long-Chain Fatty Acid-CoA Ligase 2" [ti] OR "Long-Chain Acyl-CoA Synthetase" [ti] OR "Long-Chain Acyl-CoA Synthetase 1" [ti] OR "Long-Chain Acyl-CoA Synthetase 2" [ti] OR "Lignoceroyl-CoA Synthase" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] OR "Palmitoyl-CoA Ligase 2" [ti] OR "Acyl-CoA Synthetase 1" [ti] OR "LACS 1" [ti] OR "LACS 2" [ti] OR "LACS-1" [ti] OR "LACS-2" [ti] OR "FACL2" [ti] OR "FACL1" [ti] OR "LACS1" [ti] OR "LACS2" [ti] OR "ACS1" [ti] OR "LACS" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 1" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] AND (acyl [ti] OR CoA [ti] OR fatty [ti] OR synthetase [ti] OR Palmitoyl [ti] OR ligase [ti] OR ACSL1 [ti]'
#### To count the number of PubMed IDs####
my_entrez_id <- get_pubmed_ids(my_query)
my_entrez_id$Count
输入
Webpage: https://pubmed.ncbi.nlm.nih.gov/
Search String: "ACSL1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 2" [ti] OR "Long-Chain Fatty-Acid-Coenzyme A Ligase 1" [ti] OR "Long-Chain-Fatty-Acid–CoA Ligase 1" [ti] OR "Long-Chain Fatty Acid-CoA Ligase 2" [ti] OR "Long-Chain Acyl-CoA Synthetase" [ti] OR "Long-Chain Acyl-CoA Synthetase 1" [ti] OR "Long-Chain Acyl-CoA Synthetase 2" [ti] OR "Lignoceroyl-CoA Synthase" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] OR "Palmitoyl-CoA Ligase 2" [ti] OR "Acyl-CoA Synthetase 1" [ti] OR "LACS 1" [ti] OR "LACS 2" [ti] OR "LACS-1" [ti] OR "LACS-2" [ti] OR "FACL2" [ti] OR "FACL1" [ti] OR "LACS1" [ti] OR "LACS2" [ti] OR "ACS1" [ti] OR "LACS" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 1" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] AND (acyl [ti] OR CoA [ti] OR fatty [ti] OR synthetase [ti] OR Palmitoyl [ti] OR ligase [ti] OR ACSL1 [ti]
预期输出:
dput(Output)
structure(list(Title = c("The 3-ketoacyl-CoA thiolase: an engineered enzyme for carbon chain elongation of chemical compounds.",
"Potential influence of miR-192 on the efficacy of saxagliptin treatment in T2DM complicated with non-alcoholic fatty liver disease.",
"Myosteatosis in nonalcoholic fatty liver disease: An exploratory study."
), PMID = c(32830293L, 32829627L, 32828745L)), class = "data.frame", row.names = c(NA,
-3L))
解决方案
您需要收集和解析查询结果。我认为这样的事情应该做
my_entrez_id <- get_pubmed_ids(my_query)
my_entrez_data <- fetch_pubmed_data(my_entrez_id)
my_entrez_list <- my_entrez_data %>%
XML::xmlParse() %>%
XML::xmlToList() #turn the xml int an R List thats is easier to handle
my_entrez_df <- my_entrez_list %>%
purrr::map_df(function(x){ # use the map function from the package purrr to select the attributes that we need
tibble(
Title = x$MedlineCitation$Article$ArticleTitle[[1]],
PMID = x$MedlineCitation$PMID[[1]])}
)
推荐阅读
- javascript - Google Apps 脚本连接数字而不是相加
- r - 如何在 S4 对象中包含 S3 超类作为插槽?
- google-maps - 离子原生谷歌地图 getVisibleRegion()
- quartz-composer - Quartz Composer - 捕获合成的输出作为另一个补丁的输入
- java - Spring WebFlux:如何在 HandlerFilterFunction 中访问请求体
- ios - 如何为 2 个 UIView 提供一个 IB 插座
- javascript - 如何在 x 次迭代后暂停循环 x 秒然后恢复
- java - Wildfly 尝试启动的第二个实例
- javascript - 不返回用户数组输入
- android - 在长期运行的服务中获取和存储 Firestore 实例