首页 > 解决方案 > Python评估csv文件中的重复元素

问题描述

我有 2 个 csv 文件:

CSV 1:

CHANNEL
3
3
4
1
2
1
4
5

CSV 2:

CHANNEL
1
2
2
3
4
4
4
5

我想通过查找重复通道来评估通道的状态。如果通道数 > 1,则状态为 0,否则状态为 1。

输出csv:

index  channel 1  channel 2  channel 3  channel 4  channel 5
  1        0          1         0          0           0
  2        1          0         1          0           1

到目前为止,我已经计算了重复的频道,但仅针对 1 个文件。现在我不知道如何读取 2 个 csv 文件并创建输出文件。

import csv
import collections

with open("csvfile.csv") as f:
    csv_data = csv.reader(f,delimiter=",")
    next(csv_data)
    count = collections.Counter()
    for row in csv_data:
        channel = row[0]
        count[channel] += 1
    for channel, nb in count.items():
        if nb>1:

标签: python

解决方案


您可以将每个文件读入一个列表,然后检查每个列表的通道数。

试试这个代码:

ss1 = '''
CHANNEL
3
3
4
1
2
1
4
5
'''.strip()

ss2 = '''
CHANNEL
1
2
2
3
4
4
4
5
'''.strip()


with open("csvfile1.csv",'w') as f: f.write(ss1)  # write test file 1
with open("csvfile2.csv",'w') as f: f.write(ss2)  # write test file 2

#############################

with open("csvfile1.csv") as f:
   lines1 = f.readlines()[1:]  # skip header
   lines1 = [int(x) for x in lines1] # convert to ints
   
with open("csvfile2.csv") as f:
   lines2 = f.readlines()[1:]  # skip header
   lines2 = [int(x) for x in lines2] # convert to ints

lines = [lines1,lines2] # make list for iteration

state = [[0]*5,[0]*5]  # default zero for each state

for ci in [0,1]: # each file 
   for ch in range(5):  # each channel
      state[ci][ch] = 0 if lines[ci].count(ch+1) > 1 else 1 # check channel count, set state

# write to terminal
print('Index','Channel 1','Channel 2','Channel 3','Channel 4','Channel 5', sep = '  ')
print('  ',1,'     ','          '.join(str(c) for c in state[0]))
print('  ',2,'     ','          '.join(str(c) for c in state[1]))
    
# write to csv
with open('state.csv','w') as f:
   f.write('Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5\n')
   f.write('1,' + ','.join(str(c) for c in state[0]) + '\n')
   f.write('2,' + ','.join(str(c) for c in state[1]) + '\n')

输出(终端)

Index  Channel 1  Channel 2  Channel 3  Channel 4  Channel 5
   1       0          1          0          0          1
   2       1          0          1          0          1

输出 (state.csv)

Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5
1,0,1,0,0,1
2,1,0,1,0,1

推荐阅读