首页 > 解决方案 > 使用R中的if语句逐行比较csv

问题描述

我正在使用 R/Rstudio 比较两个 csv 文件,我想逐行比较它们,但是根据它们的列以特定的顺序进行比较。如果我的数据如下所示:

first <-read.csv(text="
name,    number,    description,    version,    manufacturer
A123,    12345,     first piece,    1.0,        fakemanufacturer
B107,    00001,     second,         1.0,        abcde parts
C203,    20000,     third,          NA,         efgh parts
D123,    12000,     another,        2.0,        NA")

第二个csv:

second <- read.csv(text="
name,    number,    description,    version,    manufacturer
A123,    12345,     first piece,    1.0,        fakemanufacturer
B107,    00001,     second,         1.0,        abcde parts
C203,    20000,     third,          NA,         efgh parts
E456,    45678,     third,          2.0,       ")

我想要一个看起来像这样的 for 循环:

for line in csv1:
    if number exists in csv2:
        if csv1$name == csv2$name:
            if csv1$description == csv$description:
                if csv1$manufacturer == csv2$manufacturer:
                    break
                else:
                    add line to csv called changed, append a value for "changed" column to manufacturer
            else:
                add line to csv called changed, append a value for "changed" column to description

依此类推,输出看起来像:

name    number    description    version    manufacturer        changed
A123    12345     first piece    1.0        fakemanufacturer    number
B107    00001     second         1.0        abcde parts         no change
C204    20000     third                     newmanufacturer     number, manufacturer     
D123    12000     another        2.0                            removed
E456    45678     third          2.0                            added

如果在这个循环中的任何一点不匹配,我想知道不匹配在哪里。这些行可以通过数字或描述进行匹配。例如,鉴于上面的 2 行,我将能够分辨出两个 csv 文件之间的数字发生了变化。提前感谢您的帮助!!

标签: rcsv

解决方案


它应该是这样的,但是由于您没有提供任何数据来测试它,我无法保证我的代码:


cmpDF <- function(DF1, DF2){
    DF2 <- DF2[DF2$number %in% DF1$number,] #keep only the rows of DF2 that are
                                             #also in DF1
    retChar <- character(nrow(DF1)) 
    names(retChar) <- DF1$number #call the retChar vector with the number
                                 # to be able to update it later

    DF1 <- DF1[DF1$number %in% DF2$number,]#keep only the rows of DF1 that are
                                             #also in DF2


    # sort rows to make sure that equal rows have the same row number:
    DF1 <- DF1[order(DF1$number),] 
    DF2 <- DF2[order(DF2$number),]

    equals <- DF1 == DF2
    identical <- rowSums(DF1 == DF2) == ncol(DF1) #here all elements are the same
    retChar[as.character(DF1$number[identical])] <- "no change"

    for(i in 1:ncol(DF1)){
        if(colnames(DF1)[i] == "number") next

        different <- !equals[,i]
        retChar[as.character(DF1$number[different])] <- ifelse(nchar(retChar[as.character(DF1$number[different])]),
                                                               paste0(retChar[as.character(DF1$number[different])], colnames(DF1)[i], sep = ", "),
                                                               colnames(DF1)[i])

    }

    retChar[nchar(retChar) == 0] <- "number not in DF2" 
    return(retChar)

}


推荐阅读