首页 > 解决方案 > 创建唯一的记录数据框,其中每个申请人的多个记录都存在

问题描述

“我有一个数据框,其中存在每个申请人 id 的银行相关信息。假设申请人有多个帐户,并且数据框在多行中反映了此信息。现在我想创建一个数据框,其中每个申请人的所有信息都在一个记录”

我已经尝试过 for 和 if 循环。现在我想优化代码

com_data <- function(X) {
  data_set <- data.frame(table(X$id))
  a <- 3
  n <- 3
  for (i in 1:nrow(data_set)) {
    for (j in 1:nrow(X[1:4])) {
      if (data_set$Var1[i] == X$id[j]) {
        count <- count + 1
        #k <- j
      }
      if (count == 1) {
        for (k in 3:ncol(X))

          data_set[i, n] <- X[j, k]
        n <- n + 1

      } else{
        for (k in 3:ncol(X))

          data_set[i, n] <- X[j, k]
        n <- n + 1

      }
    }
    count = 0
    n <- 3

  }

  return(data_set)
}

标签: rdataframeif-statementcbind

解决方案


假设您的数据框不包含列表向量,则会变得有点混乱。“Var”应该是申请人ID:

# Sample data used: 

df <- data.frame(

  Date = as.Date(c("27/9/2019", "28/9/2019", "1/10/2019", "2/10/2019"), "%d/%m/%y"),

  dateTime = as.POSIXct(c("27/9/2019", "28/9/2019", "1/10/2019", "2/10/2019"), "%d/%m/%y %H:M:S"),

  Var = as.factor(c("A", "A", "B", "B")), 

  Value = c(56, 50, 90, 100),

  stringsAsFactors = F
)


# Convert factors & dates to strings: 

convert_descriptors_to_char <- function(df){

  as.data.frame(lapply(df, 
                       function(x){

                         if(is.factor(x) | inherits(x, "Date") | inherits(x, "POSIXct") | inherits(x, "POSIXlt")) { 

                           as.character(trimws(x, which = "both"))

                         } else{ 

                           x

                         }

                       }

  ),

  stringsAsFactors = FALSE) 

}

# Convert data types: 

df <- convert_descriptors_to_char(df)

# Merge the separate lists into one: 

df_aggd <- lapply(df, function(x){

                           if(is.character(x)){

                           aggregate(x~df$Var, df, paste0, collapse = ", ")

                           }else if(is.numeric(x)){

                            aggregate(x~df$Var, df, sum) 

                           }else{

                             x

                           }

                         }

                  )

    # Vector to rename "x" to:

    x_vect_names <- names(sapply(df_aggd, function(x){deparse(substitute(x))}))

    # Iterate through list to rename: 

    for (i in seq_along(df_aggd)){

      colnames(df_aggd[[i]]) <- c("Var", x_vect_names[i])

    }

    # Remove Var df: 

    df_aggd <- df_aggd[names(df_aggd) != "Var"]

    # Merge the separate dataframes into one:

    Reduce(function(x, y){merge(x, y, all = TRUE, by = intersect(colnames(x), colnames(y)))}, df_aggd)

推荐阅读