首页 > 解决方案 > 如何正确使用separate_rows

问题描述

我的数据是与文章作者相关的学术机构列表,我正在使用的文章看起来像这样:

1   MIT
2   NBER; NBER
3   U MI; Cornell U; U VA
4   Harvard U; U Chicago
5   U OR; U CA, Davis; U British Columbia
6   World Bank; Dartmouth College; EDHEC Business School; Harvard U
7   Columbia U and IZA; Columbia U and IZA
8   World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College
9   Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U
10  Columbia U; U CA, San Diego
11  U CA, Berkeley; McMaster U; McMaster U
12  ETH Zurich and CESifo; U Copenhagen and CESifo

我想用分号(最好是“和”)分隔行,以便我可以找出哪些学术机构是独一无二的。

我尝试通过使用 tidyr 包中的 separate_rows-function 来做到这一点:

Affiliation<-separate_rows(Affiliation, sep=";")

或者:

Affiliation<-separate_rows(Affiliation, sep="; | and")

这些方法都不起作用,我的数据看起来完全一样。我究竟做错了什么?

在下面附加 dput 输出:

structure(list(AF = c("MIT", "NBER; NBER", "U MI; Cornell U; U VA", 
"Harvard U; U Chicago", "U OR; U CA, Davis; U British Columbia", 
"World Bank; Dartmouth College; EDHEC Business School; Harvard U", 
"Columbia U and IZA; Columbia U and IZA", "World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College", 
"Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U", "Columbia U; U CA, San Diego", 
"U CA, Berkeley; McMaster U; McMaster U", "ETH Zurich and CESifo; U Copenhagen and CESifo", 
"U MN, St Paul; Compass Lexecon, Washington, DC; Harvard U", 
"U WI", "U Chicago and IZA; Harvard U; Harvard U")), row.names = c(NA, 
15L), class = "data.frame")

标签: rdelimitertidyr

解决方案


推荐阅读