r - I'm getting NA's applying separate() function over column of characters in R
问题描述
I'm trying to split a column that are formatted very differently. For example:
pharma <- c("DOXORUBICINA CLORH. FAM 50MG POL O LIOF",
"DROSPIRENONA/ETINILESTR. 3/0,02MG CM REC",
"DROSPIRENONA/ETINILESTR. 3/0,03MG CM REC",
"ETRAVIRINA 100 MG CM",
"AGALSIDASA ALFA 1MG/ML X 3,5 ML FAM")
And i'm using separate()
to do the split in two different columns (i need separate the product name (i.e. DOXORUBICINA CLORH. FAM) and the details (50MG POL O LIOF)). The code is:
separate(data.frame(A = pharma), col = "A" , into = c("x","y"),sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")
But i have the next by from R:
x y
1 DOXORUBICINA CLORH. FAM 50MG POL O LIOF
2 DROSPIRENONA/ETINILESTR. 3/0,02MG CM REC <NA>
3 DROSPIRENONA/ETINILESTR. 3/0,03MG CM REC <NA>
4 ETRAVIRINA 100 MG CM
5 AGALSIDASA ALFA 1MG/ML X
Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 1 rows [5].
2: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [2, 3].
I can't see what is happening.
Any help is highly appreciated. Thank you in advance!
解决方案
The data on the second and third row contains a dot between the letters and whitespace, your pattern only accounts for 0+ whitespace chars between a letter and a digit.
You may use
sep = "(?<=[a-zA-Z])\\W+(?=[0-9])"
or
sep = "(?<=[a-zA-Z])\\W*(?=[0-9])"
The \W
pattern matches any non-word chars, any char other than letter, digit and _
.
See the regex demo.
R test:
> separate(data.frame(A = pharma), col = "A" , into = c("x","y"), sep = "(?<=[a-zA-Z])\\W*(?=[0-9])")
x y
1 DOXORUBICINA CLORH. FAM 50MG POL O LIOF
2 DROSPIRENONA/ETINILESTR 3/0,02MG CM REC
3 DROSPIRENONA/ETINILESTR 3/0,03MG CM REC
4 ETRAVIRINA 100 MG CM
推荐阅读
- java - 使用 devtools 的 Spring Boot docker autoreload
- android - 无法使用 Coroutines + RetroFit 获取数据
- java - Spring 文档中最简单的示例不起作用。“考虑在你的配置中定义一个 'xxxRepository' 类型的 bean。”
- javascript - Multi React-Select 未设置值
- algorithm - 算法中的大 O
- json - 如何在 Android Studio 中为单引号键获取 JSON Linter
- mongodb - 将数据存储到两个数据库引擎中
- vb.net - 如何在 Access 数据库中编辑记录 - Visual Basic
- python - Selenium/Python 无法在 CSS_SELECTOR 中使用`:contains()`
- php - 警告:未定义的数组键出现在多个文件中