r - 分隔逗号分隔值不同行
问题描述
我有这样的数据。
structure(list(structureId = c("1JDN", "1DP4", "1XS5", "1SW1",
"1P99", "1IXH"), structureTitle = c("Crystal Structure of Hormone Receptor",
"DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRIURETIC PEPTIDE RECEPTOR",
"The Crystal Structure of Lipoprotein Tp32 from Treponema pallidum",
"Crystal structure of ProX from Archeoglobus fulgidus in complex with proline betaine",
"1.7A crystal structure of protein PG110 from Staphylococcus aureus",
"PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSPHATE"),
chainId = c("A", "A", "A", "A", "A", "A"), ligandId = c("BMA,CL,FUC,MAN,NAG,NDG",
"CL,NAG,SO4", "MET", "MSE,PBE,ZN", "GLY,MET", "PO4"), ligandName = c("BETA-D-MANNOSE,CHLORIDE ION,ALPHA-L-FUCOSE,ALPHA-D-MANNOSE,N-ACETYL-D-GLUCOSAMINE,2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE",
"CHLORIDE ION,N-ACETYL-D-GLUCOSAMINE,SULFATE ION", "METHIONINE",
"SELENOMETHIONINE,1,1-DIMETHYL-PROLINIUM,ZINC ION", "GLYCINE,METHIONINE",
"PHOSPHATE ION")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
我想将ligandId
和的值拆分为ligandName
不同的行。我的意思是,每行只有 1ligandId
个ligandName
。
我试过使用separate_rows
,但我的两列处理得不好。
df %>% separate_rows(ligandId, ligandName, sep = ",")
但我收到了这个错误:
> df %>% separate_rows(ligandId, ligandName, sep = ",")
Error: All nested columns must have the same number of elements.
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: All nested columns must have the same number of elements.
class: `rlang_error`
backtrace:
1. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
10. tidyr:::unnest.data.frame(data, !!!syms(vars), .drop = FALSE)
12. tidyr::separate_rows(., ligandId, ligandName, sep = ",")
Call `rlang::last_trace()` to see the full backtrace
另外,我试过这个:将一列中的逗号分隔字符串拆分为单独的行,但没有成功。
最后我想要这样的东西:
1JDN A BMA BETA-D-MANNOSE
1JDN A CL CHLORIDE ION
1JDN A FUC ALPHA-L-FUCOSE
1JDN A MAN ALPHA-D-MANNOSE
1JDN A NAG N-ACETYL-D-GLUCOSAMINE
1JDN A NDG 2-(ACETYLAMINO)-2-DEOXY-A-D-GLUCOPYRANOSE
...
解决方案
我们可以用separate_rows
library(tidyverse)
df1 %>%
separate_rows(ligandId, sep=",")
由于'ligandId'、'ligandName'每行的字数不一样,一种选择是转gather
成'long'格式,然后separate_rows
在'val'列上做,最后spread
回到'wide'
df1 %>%
gather(key, val, ligandId, ligandName) %>%
separate_rows(val, sep=",") %>%
group_by(structureId, key) %>%
mutate(rn = row_number()) %>%
spread(key, val) %>%
select(-rn)
# A tibble: 17 x 5
# Groups: structureId [6]
# structureId structureTitle chainId ligandId ligandName
# <chr> <chr> <chr> <chr> <chr>
# 1 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A CL CHLORIDE ION
# 2 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A NAG N-ACETYL-D-GLUCOSAMINE
# 3 1DP4 DIMERIZED HORMONE BINDING DOMAIN OF THE ATRIAL NATRI… A SO4 SULFATE ION
# 4 1IXH PHOSPHATE-BINDING PROTEIN (PBP) COMPLEXED WITH PHOSP… A PO4 PHOSPHATE ION
# 5 1JDN Crystal Structure of Hormone Receptor A BMA BETA-D-MANNOSE
# 6 1JDN Crystal Structure of Hormone Receptor A CL CHLORIDE ION
# 7 1JDN Crystal Structure of Hormone Receptor A FUC ALPHA-L-FUCOSE
# 8 1JDN Crystal Structure of Hormone Receptor A MAN ALPHA-D-MANNOSE
# 9 1JDN Crystal Structure of Hormone Receptor A NAG N-ACETYL-D-GLUCOSAMINE
#10 1JDN Crystal Structure of Hormone Receptor A NDG 2-(ACETYLAMINO)-2-DEOXY-A…
#11 1P99 1.7A crystal structure of protein PG110 from Staphyl… A GLY GLYCINE
#12 1P99 1.7A crystal structure of protein PG110 from Staphyl… A MET METHIONINE
#13 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A MSE SELENOMETHIONINE
#14 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A PBE 1
#15 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A ZN 1-DIMETHYL-PROLINIUM
#16 1SW1 Crystal structure of ProX from Archeoglobus fulgidus… A <NA> ZINC ION
#17 1XS5 The Crystal Structure of Lipoprotein Tp32 from Trepo… A MET METHIONINE
对于字数不同的多列,使用cSplit
library(splitstackshape)
na.omit(cSplit(df1, c("ligandId", "ligandName"), sep=",", "long"))
推荐阅读
- python - 使用 pathlib 修改后 word 文档出错
- python - 使用 Python + 传输创建和播种种子
- ruby-on-rails - 如何阻止用户将整数输入减少到小于 1
- css - 带有拉伸和 SVG 后代的超大嵌套 flexbox
- javascript - 为什么 gulp 会抛出“您是否忘记发出异步完成信号?”
- json - Flutter 获取列表 Json
- firebase - Firebase Firestore 数据库设计
- c# - 测试一个方法是否使用不同的参数被调用了两次
- javascript - 如何从javascript中的数组中获取不同ID和不同服务名称的最后一个对象?
- npm - npm 中是否有与 yarn audit --groups "dependencies" 等效的功能?