首页 > 解决方案 > 将多个重复向量分配为另一列中的值

问题描述

我有一个包含许多 NA 的数据集,但我可以用替代名称填充这些 NA。目的是在之后使用这些名称,例如通过匹配值合并数据集。

但是,我无法分配这些字符向量,因为它不是数据框的大小。

例如:

join_pop1[is.na(join_pop1$UK_Districts.y)] <- pop_names

x 输入的大小为 19,但下标的is.na(join_pop1$UK_Districts.y)大小为 1437。

本质上,我希望将每个名称分配给具有每个唯一城市的 NA 的值,例如,这是我的数据集的一瞥:

# A tibble: 132 x 2
   UK_Districts.x                        UK_Districts.y
   <chr>                                 <chr>         
 1 Abertawe - Swansea                    NA            
 2 Abertawe - Swansea                    NA            
 3 Abertawe - Swansea                    NA            
 4 Abertawe - Swansea                    NA            
 5 Brent London Boro                     NA            
 6 Brent London Boro                     NA            
 7 Brent London Boro                     NA            
 8 Brent London Boro                     NA            
 9 Bro Morgannwg - the Vale of Glamorgan NA            
10 Bro Morgannwg - the Vale of Glamorgan NA       

预期结果:

# A tibble: 132 x 2
   UK_Districts.x                        UK_Districts.y
   <chr>                                 <chr>         
 1 Abertawe - Swansea                    Swansea            
 2 Abertawe - Swansea                    Swansea       
 3 Abertawe - Swansea                    Swansea       
 4 Abertawe - Swansea                    Swansea       
 5 Brent London Boro                     Brent            
 6 Brent London Boro                     Brent            
 7 Brent London Boro                     Brent            
 8 Brent London Boro                     Brent            
 9 Bro Morgannwg - the Vale of Glamorgan Vale of Glamorgan            
10 Bro Morgannwg - the Vale of Glamorgan Vale of Glamorgan       

可重现的代码:

#city names to assign
pop_names <- c("Swansea", "Brent", "Vale of Glamorgan", "South Bucks", "Cardiff", 
"Caerphilly", "Newport", "Neath Port Talbot", "City of London", 
"Bristol, City of", "Derby", "Leicester", "Peterborough", "Plymouth", 
"Portsmouth", "Southampton", "Stoke-on-Trent", "Westminster", 
"Wolverhampton", "Herefordshire, County of", "Shepway", "Merthyr Tydfil", 
"Bridgend", "Pembrokeshire", "Ceredigion", "Denbighshire", "Monmouthshire", 
"Carmarthenshire", "Flintshire", "Isle of Anglesey", "Somerset", 
"Brighton and Hove", "Wrexham")

join_pop1 <- structure(list(UK_Districts.x = c("Abertawe - Swansea", "Abertawe - Swansea", 
"Abertawe - Swansea", "Abertawe - Swansea", "Brent London Boro", 
"Brent London Boro", "Brent London Boro", "Brent London Boro", 
"Bro Morgannwg - the Vale of Glamorgan", "Bro Morgannwg - the Vale of Glamorgan", 
"Bro Morgannwg - the Vale of Glamorgan", "Bro Morgannwg - the Vale of Glamorgan", 
"Buckinghamshire", "Buckinghamshire", "Buckinghamshire", "Buckinghamshire", 
"Caerdydd - Cardiff", "Caerdydd - Cardiff", "Caerdydd - Cardiff", 
"Caerdydd - Cardiff", "Caerffili - Caerphilly", "Caerffili - Caerphilly", 
"Caerffili - Caerphilly", "Caerffili - Caerphilly", "Casnewydd - Newport", 
"Casnewydd - Newport", "Casnewydd - Newport", "Casnewydd - Newport", 
"Castell-nedd Port Talbot - Neath Port Talbot", "Castell-nedd Port Talbot - Neath Port Talbot", 
"Castell-nedd Port Talbot - Neath Port Talbot", "Castell-nedd Port Talbot - Neath Port Talbot", 
"City and County of the City of London", "City and County of the City of London", 
"City and County of the City of London", "City and County of the City of London", 
"City of Bristol ", "City of Bristol ", "City of Bristol ", "City of Bristol ", 
"City of Derby ", "City of Derby ", "City of Derby ", "City of Derby ", 
"City of Leicester ", "City of Leicester ", "City of Leicester ", 
"City of Leicester ", "City of Peterborough ", "City of Peterborough ", 
"City of Peterborough ", "City of Peterborough ", "City of Plymouth ", 
"City of Plymouth ", "City of Plymouth ", "City of Plymouth ", 
"City of Portsmouth ", "City of Portsmouth ", "City of Portsmouth ", 
"City of Portsmouth ", "City of Southampton ", "City of Southampton ", 
"City of Southampton ", "City of Southampton ", "City of Stoke-on-Trent ", 
"City of Stoke-on-Trent ", "City of Stoke-on-Trent ", "City of Stoke-on-Trent ", 
"City of Westminster London Boro", "City of Westminster London Boro", 
"City of Westminster London Boro", "City of Westminster London Boro", 
"City of Wolverhampton  ", "City of Wolverhampton  ", "City of Wolverhampton  ", 
"City of Wolverhampton  ", "County of Herefordshire", "County of Herefordshire", 
"County of Herefordshire", "County of Herefordshire", "Folkestone and Hythe", 
"Folkestone and Hythe", "Folkestone and Hythe", "Folkestone and Hythe", 
"Merthyr Tudful - Merthyr Tydfil", "Merthyr Tudful - Merthyr Tydfil", 
"Merthyr Tudful - Merthyr Tydfil", "Merthyr Tudful - Merthyr Tydfil", 
"Pen-y-bont ar Ogwr - Bridgend", "Pen-y-bont ar Ogwr - Bridgend", 
"Pen-y-bont ar Ogwr - Bridgend", "Pen-y-bont ar Ogwr - Bridgend", 
"Sir Benfro - Pembrokeshire", "Sir Benfro - Pembrokeshire", "Sir Benfro - Pembrokeshire", 
"Sir Benfro - Pembrokeshire", "Sir Ceredigion - Ceredigion", 
"Sir Ceredigion - Ceredigion", "Sir Ceredigion - Ceredigion", 
"Sir Ceredigion - Ceredigion", "Sir Ddinbych - Denbighshire", 
"Sir Ddinbych - Denbighshire", "Sir Ddinbych - Denbighshire", 
"Sir Ddinbych - Denbighshire", "Sir Fynwy - Monmouthshire", "Sir Fynwy - Monmouthshire", 
"Sir Fynwy - Monmouthshire", "Sir Fynwy - Monmouthshire", "Sir Gaerfyrddin - Carmarthenshire", 
"Sir Gaerfyrddin - Carmarthenshire", "Sir Gaerfyrddin - Carmarthenshire", 
"Sir Gaerfyrddin - Carmarthenshire", "Sir y Fflint - Flintshire", 
"Sir y Fflint - Flintshire", "Sir y Fflint - Flintshire", "Sir y Fflint - Flintshire", 
"Sir Ynys Mon - Isle of Anglesey", "Sir Ynys Mon - Isle of Anglesey", 
"Sir Ynys Mon - Isle of Anglesey", "Sir Ynys Mon - Isle of Anglesey", 
"Somerset West and Taunton", "Somerset West and Taunton", "Somerset West and Taunton", 
"Somerset West and Taunton", "The City of Brighton and Hove ", 
"The City of Brighton and Hove ", "The City of Brighton and Hove ", 
"The City of Brighton and Hove ", "Wrecsam - Wrexham", "Wrecsam - Wrexham", 
"Wrecsam - Wrexham", "Wrecsam - Wrexham"), UK_Districts.y = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_)), row.names = c(NA, -132L), class = c("tbl_df", 
"tbl", "data.frame"))


标签: r

解决方案


我们可以str_extractstringr包中使用。首先我们创建一个模式pop_names

library(dplyr)
library(stringr)
pattern <- paste(as.character(pop_names), collapse = "|")

join_pop1 %>% 
    mutate(UK_Districts.y = str_extract(UK_Districts.x, pattern))

输出:

 UK_Districts.x                        UK_Districts.y   
   <chr>                                 <chr>            
 1 Abertawe - Swansea                    Swansea          
 2 Abertawe - Swansea                    Swansea          
 3 Abertawe - Swansea                    Swansea          
 4 Abertawe - Swansea                    Swansea          
 5 Brent London Boro                     Brent            
 6 Brent London Boro                     Brent            
 7 Brent London Boro                     Brent            
 8 Brent London Boro                     Brent            
 9 Bro Morgannwg - the Vale of Glamorgan Vale of Glamorgan
10 Bro Morgannwg - the Vale of Glamorgan Vale of Glamorgan
# ... with 122 more rows

推荐阅读