首页 > 解决方案 > 根据模式将字符串拆分为列

问题描述

33467389|t|Immune Therapies for Hematologic Malignancies.
33467389|a|The era of immunotherapy for hematologic malignancies began with the first allogeneic hematopoietic stem cell transplant (HSCT) study published by E [...].
33477248|t|Unraveling the Role of Innate Lymphoid Cells in AcuteMyeloid Leukemia.
33477248|a|Over the past 50 years, few therapeutic advances have been made in treating.

这是我在我的文件中反复出现的模式。

ID 是一个数字,例如33467389|t|论文的标题。同样33467389|a|,这表示论文的摘要 ID。

lines <- readLines("output_1/Gemtuzumab_Adult/G1.txt")

所以我读这样的文件

所以这种模式贯穿了我的文字。有没有办法把它分成列

ID                              Abstract 
33467389      The era of immunotherapy for hematologic malignancies

标签: r

解决方案


这里使用sub的是一个基本的 R 选项:

df$ID <- sub("\\|.*$", "", df$text)
df$Abstract <- sub("^.*\\|", "", df$text)
df[, c("ID", "Abstract")]

        ID Abstract
1 33467389 Immune Therapies for Hematologic Malignancies.
2 33467389 The era of immunotherapy for hematologic malignancies began with the first allogeneic hematopoietic stem cell transplant (HSCT) study published by E [...].
3 33477248 Unraveling the Role of Innate Lymphoid Cells in AcuteMyeloid Leukemia.
4 33477248 Over the past 50 years, few therapeutic advances have been made in treating.

推荐阅读