首页 > 解决方案 > 添加一个“0”。或字符串中某些数字开头的“0.0”

问题描述

我有一些看起来像这样的数据:

# A tibble: 20 x 2
     grp cents                                                             
   <int> <chr>                                                             
 1  4625 "48 cents to 49 cents\n34 cents to 38 cents"                      
 2  5832 "71 cents to 79 cents"                                            
 3  6131 "5 cents to 10 cents"                                             
 4  5719 "71 cents to 76 cents\n71 cents to 78 cents"                      
 5  4998 "37 cents to 50 cents\n40 cents to 56 cents"                      
 6  6579 "92 cents to 94 cents"

我使用以下方法清洁cents列:

d %>% 
  mutate(
    cents = str_replace_all(cents, "cents to", "-"),
    cents = str_replace_all(cents, "cents", "")
  )

现在看起来像:

# A tibble: 20 x 2
     grp cents                         
   <int> <chr>                         
 1  4625 "48 - 49 \n34 - 38 "          
 2  5832 "71 - 79 "                    
 3  6131 "5 - 10 "                     
 4  5719 "71 - 76 \n71 - 78 "          
 5  4998 "37 - 50 \n40 - 56 "          
 6  6579 "92 - 94 "                    
 7  4074 "47 - 51 \n42 - 50 

我想为每个数字添加一个0.或一个0.0,以便最终数据看起来像:

# A tibble: 20 x 2
     grp cents                         
   <int> <chr>                         
 1  4625 "0.48 - 0.49 \n0.34 - 0.38 "          
 2  5832 "0.71 - 0.79 "                    
 3  6131 "0.05 - 0.10 "                   # NOTE: Here                
 4  5719 "0.71 - 0.76 \n0.71 - 0.78 "          
 5  4998 "0.37 - 0.50 \n0.40 - 0.56 "          
 6  6579 "0.92 - 0.94 "                    
 7  4074 "0.47 - 0.51 \n0.42 - 0.50 

我在数据中添加了一个注释,因为只是0.在以下行中添加了一个:

 3  6131 "5 - 10 "  

会给出不正确的结果。所以对于这条线我想0.05 - 0.10。所以我想添加某种条件,如果数字有 1 位,则添加 a0.0X如果数字有两位,则添加0.X

数据:

d <- structure(list(grp = c(4625L, 5832L, 6131L, 5719L, 4998L, 6579L, 
4074L, 3663L, 766L, 911L, 3051L, 348L, 6062L, 7533L, 2714L, 2309L, 
6072L, 569L, 1555L, 2753L), cents = c("48 cents to 49 cents\n34 cents to 38 cents", 
"71 cents to 79 cents", "5 cents to 10 cents", "71 cents to 76 cents\n71 cents to 78 cents", 
"37 cents to 50 cents\n40 cents to 56 cents", "92 cents to 94 cents", 
"47 cents to 51 cents\n42 cents to 50 cents", "13 cents to 15 cents", 
"5 cents to 6 cents\n24 cents to 25 cents", "12 cents to 27 cents\n43 cents to 58 cents", 
"46 cents to 62 cents", "82 cents to 88 cents", "3 cents to 10 cents", 
"45 cents to 51 cents", "4 cents to 8 cents", "3 cents to 10 cents\n23 cents to 30 cents", 
"38 cents to 42 cents", "15 cents to 25 cents", "14 cents to 17 cents", 
"33 cents to 35 cents\n33 cents to 35 cents\n33 cents to 35 cents"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

标签: r

解决方案


我们也可以在一个单一的str_replace_all

library(stringr)
library(dplyr)
d %>% 
   mutate(cents = str_replace_all(cents,
     "(\\d+)\\s+cents\\s+to\\s+(\\d+)\\s+cents", "0.\\1 -  0.\\2"))
# A tibble: 20 x 2
#     grp cents                                     
#   <int> <chr>                                     
# 1  4625 "0.48 -  0.49\n0.34 -  0.38"              
# 2  5832 "0.71 -  0.79"                            
# 3  6131 "0.5 -  0.10"                             
# 4  5719 "0.71 -  0.76\n0.71 -  0.78"              
# 5  4998 "0.37 -  0.50\n0.40 -  0.56"              
# 6  6579 "0.92 -  0.94"                            
# 7  4074 "0.47 -  0.51\n0.42 -  0.50"              
# 8  3663 "0.13 -  0.15"                            
# 9   766 "0.5 -  0.6\n0.24 -  0.25"                
#10   911 "0.12 -  0.27\n0.43 -  0.58"              
#11  3051 "0.46 -  0.62"                            
#12   348 "0.82 -  0.88"                            
#13  6062 "0.3 -  0.10"                             
#14  7533 "0.45 -  0.51"                            
#15  2714 "0.4 -  0.8"                              
#16  2309 "0.3 -  0.10\n0.23 -  0.30"               
#17  6072 "0.38 -  0.42"                            
#18   569 "0.15 -  0.25"                            
#19  1555 "0.14 -  0.17"                            
#20  2753 "0.33 -  0.35\n0.33 -  0.35\n0.33 -  0.35"

推荐阅读