首页 > 解决方案 > 在 R 中交换错位的单元格?

问题描述

我有一个庞大的数据库(超过 65M 的行),我注意到一些单元格放错了位置。例如,假设我有这个:

library("tidyverse")

DATA <- tribble(
  ~SURNAME,~NAME,~STATE,~COUNTRY,
  'Smith','Emma','California','USA',
  'Johnson','Oliia','Texas','USA',
  'Williams','James','USA','California',
  'Jones','Noah','Pennsylvania','USA',
  'Williams','Liam','Illinois','USA',
  'Brown','Sophia','USA','Louisiana',
  'Daves','Evelyn','USA','Oregon',
  'Miller','Jacob','New Mexico','USA',
  'Williams','Lucas','Connecticut','USA',
  'Daves','John','California','USA',
  'Jones','Carl','USA','Illinois'
)

=====

> DATA
# A tibble: 11 x 4
   SURNAME  NAME   STATE        COUNTRY   
   <chr>    <chr>  <chr>        <chr>     
 1 Smith    Emma   California   USA       
 2 Johnson  Oliia  Texas        USA       
 3 Williams James  USA          California
 4 Jones    Noah   Pennsylvania USA       
 5 Williams Liam   Illinois     USA       
 6 Brown    Sophia USA          Louisiana 
 7 Daves    Evelyn USA          Oregon    
 8 Miller   Jacob  New Mexico   USA       
 9 Williams Lucas  Connecticut  USA       
10 Daves    John   California   USA       
11 Jones    Carl   USA          Illinois 

如您所见,国家和州在某些行中放错了位置,我怎样才能有效地交换这些?

亲切的问候,路易斯。

标签: r

解决方案


使用data.table和内置state.name向量:

setDT(DATA)
DATA[COUNTRY %in% state.name, `:=`(COUNTRY = STATE, STATE = COUNTRY)]

DATA
#      SURNAME   NAME        STATE COUNTRY
#  1:    Smith   Emma   California     USA
#  2:  Johnson  Oliia        Texas     USA
#  3: Williams  James   California     USA
#  4:    Jones   Noah Pennsylvania     USA
#  5: Williams   Liam     Illinois     USA
#  6:    Brown Sophia    Louisiana     USA
#  7:    Daves Evelyn       Oregon     USA
#  8:   Miller  Jacob   New Mexico     USA
#  9: Williams  Lucas  Connecticut     USA
# 10:    Daves   John   California     USA
# 11:    Jones   Carl     Illinois     USA

推荐阅读