首页 > 解决方案 > 用 R 对数据帧中的缺失数据进行插值

问题描述

我有一个类似于下面的数据框:

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001        NA    NA
3  France    FR 2002        NA    NA
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        NA    NA
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000 

我以前使用以下代码来获得差异:

df <- df %>%
  arrange(country, year) %>%  #sort data
  group_by(country) %>%
  mutate_if(is.numeric, funs(d = . - lag(.)))

我想通过计算 和 的数据点之间的差异来扩展此代码HappinessPower将其除以数据点之间的年差并计算替换 NA 的值,从而得到以下输出。

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001      1200  1400    
3  France    FR 2002      1400  1800
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        0      0
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000  

执行此任务的有效方法是什么?

编辑:请注意,France 2004也是NA. 扩展功能似乎确实可以正确处理这种情况。

编辑 2:添加 group_by(country) 似乎因为未知原因搞砸了:似乎代码正在尝试将 a 转换character为 a numeric,尽管我不太明白为什么。当我将该列转换为 时character,该错误变为评估错误。有什么建议么?

> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(TRcomplete$F116.s)
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(as.character(TRcomplete$F116.s))
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s))
Error: unexpected ')' in "TRcomplete$F116.s <- as.character(TRcomplete$F116.s))"
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s)
> str(TRcomplete$F116.s)
 chr [1:6984] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Evaluation error: need at least two non-NA values to interpolate.

标签: rdplyrinterpolationlag

解决方案


您可以从库 中na.fill使用fill="extend"zoo

rapply(df, zoo::na.fill,"integer",fill="extend",how="replace")
  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

编辑:

library(tidyverse)
library(zoo)
df%>%
  group_by(Country)%>%
  mutate_at(4:5,~na.fill(.x,"extend"))

  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

如果组中的所有元素都是NA

df%>% 
  group_by(Country)%>% 
  mutate_if(is.numeric,~if(all(is.na(.x))) NA else na.fill(.x,"extend"))

推荐阅读