首页 > 解决方案 > 如何循环计算

问题描述

我正在为这些数据拟合一个线性模型:

data <- data.frame(Student_ID =c(1,1,1,2,2,3,3,3,3,3,4,4,4,5,6,6,7,7,7,8,8),
                   Years_Attended = c(1991,1992,1995,1992,1993,1991,1992,1993,1994,1995,1993,1994,1995,1995,1993,1995,1990,1995,2000,1995,1996),
                   Class = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C"),
                   marks = c(50,55,46,44,60,66,67,80,91,90,70,75,76,77,77,82,89,88,88,64,65))

目的是创建一个新列来确定标记的变化。我将此列称为marks.change,我将模型拟合如下:

data2 <- data %>% group_by(Student_ID) %>% summarise(
  Good.marks = length(marks[!is.na(marks)]),
  marks.change = ifelse(Good.marks>1,
                   summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
   Student_ID = unique(Student_ID),
  Class = unique(Class), 
  )

这段代码工作正常。然而,与一次考虑所有年份相反,我想对上面的模型(即我说“marks.change =…”的部分)拟合年份的每个间隔,然后对它们进行平均。意思是我想在 1991 年和 1992 年之间拟合模型,然后移动到 1992 年和 1993 年,然后移动到 1993 年和 1994 年等直到最后一年,然后将这些计算的平均值放在一个名为mark.change.part2的新列中

有没有更简单的方法来自动化这个?

标签: rloops

解决方案


您可以稍微简化现有代码

data %>% group_by(Student_ID, Class) %>% summarise(
  Good.marks = sum(!is.na(marks)),
  marks.change = ifelse(Good.marks>1,
                        summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
  )

# A tibble: 8 x 4
# Groups:   Student_ID [8]
  Student_ID Class Good.marks marks.change
       <dbl> <chr>      <int>        <dbl>
1          1 A              3        -1.46
2          2 A              2        16.  
3          3 A              5         7.2 
4          4 B              3         3.  
5          5 B              1         0   
6          6 B              2         2.50
7          7 C              3        -0.1 
8          8 C              2         1.00

现在你的问题部分 - 如果我正确理解你,也许你想要这个。 实际上,两点数据的线性模型只不过是手动计算斜率,您可以使用简单的矢量数学轻松计算。

data %>% group_by(Student_ID, Class) %>% summarise(
  Good.marks = sum(!is.na(marks)),
  marks.change = ifelse(Good.marks>1,
                        summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
  marks.change.part2 = ifelse(Good.marks>1, mean(diff(marks)/diff(Years_Attended)), 0))

# A tibble: 8 x 5
# Groups:   Student_ID [8]
  Student_ID Class Good.marks marks.change marks.change.part2
       <dbl> <chr>      <int>        <dbl>              <dbl>
1          1 A              3        -1.46                1  
2          2 A              2        16.                 16  
3          3 A              5         7.2                 6  
4          4 B              3         3.                  3  
5          5 B              1         0                   0  
6          6 B              2         2.50                2.5
7          7 C              3        -0.1                -0.1
8          8 C              2         1.00                1 

推荐阅读