首页 > 解决方案 > I'm getting NA's applying separate() function over column of characters in R

问题描述

I'm trying to split a column that are formatted very differently. For example:

pharma <- c("DOXORUBICINA CLORH. FAM 50MG POL O LIOF",
                   "DROSPIRENONA/ETINILESTR. 3/0,02MG CM REC",
                   "DROSPIRENONA/ETINILESTR. 3/0,03MG CM REC",
                   "ETRAVIRINA 100 MG CM",
                   "AGALSIDASA ALFA 1MG/ML X 3,5 ML FAM")

And i'm using separate() to do the split in two different columns (i need separate the product name (i.e. DOXORUBICINA CLORH. FAM) and the details (50MG POL O LIOF)). The code is:

separate(data.frame(A = pharma), col = "A" , into = c("x","y"),sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")

But i have the next by from R:

                                         x               y
1                  DOXORUBICINA CLORH. FAM 50MG POL O LIOF
2 DROSPIRENONA/ETINILESTR. 3/0,02MG CM REC            <NA>
3 DROSPIRENONA/ETINILESTR. 3/0,03MG CM REC            <NA>
4                               ETRAVIRINA       100 MG CM
5                          AGALSIDASA ALFA        1MG/ML X
Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 1 rows [5]. 
2: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [2, 3]. 

I can't see what is happening.

Any help is highly appreciated. Thank you in advance!

标签: rregexstrsplit

解决方案


The data on the second and third row contains a dot between the letters and whitespace, your pattern only accounts for 0+ whitespace chars between a letter and a digit.

You may use

sep = "(?<=[a-zA-Z])\\W+(?=[0-9])" 

or

sep = "(?<=[a-zA-Z])\\W*(?=[0-9])"

The \W pattern matches any non-word chars, any char other than letter, digit and _.

See the regex demo.

R test:

> separate(data.frame(A = pharma), col = "A" , into = c("x","y"), sep = "(?<=[a-zA-Z])\\W*(?=[0-9])")
                        x               y
1 DOXORUBICINA CLORH. FAM 50MG POL O LIOF
2 DROSPIRENONA/ETINILESTR 3/0,02MG CM REC
3 DROSPIRENONA/ETINILESTR 3/0,03MG CM REC
4              ETRAVIRINA       100 MG CM

推荐阅读