首页 > 解决方案 > 从 csv 文件中删除字符和重复项并写入新文件

问题描述

我正在读取一个看起来像这样的 csv 文件:

[152.60115606936415][152.60115606936415, 13181.818181818182][152.60115606936415, 13181.818181818182, 1375055.330634278][152.60115606936415, 13181.818181818182, 1375055.330634278, 89.06882591093118]

我想要做的是删除字符([,]和空格到新行)并将其写入我的新txt文件

import csv
to_file =open("t_put.txt","w")
with open("t_put_val.20181026052328.csv", "r") as f:
   for row in (list(csv.reader(f))):
   value2= (" ".join(row)[1:-1]) #remove 3 first and last elements
   value = value2.replace("  ","\n")# replace spaces with newline
   value3 = value.replace("]["," ") # replace ][
   value4 = value3.replace(" ","\n")
   print(value4)
  # st = str(s)
   to_file.write(value4)#write to file
to_file.close()

使用此代码,我可以删除字符,但仍会出现重复项。我正在考虑使用 set() 方法,但它没有按预期工作或只是打印出最后四个数字,但可能不适用于更大的数据集

标签: python

解决方案


通过按 ']' 分割,您可以对 csv 中的每个列表进行分组。

# Open up the csv file
with open("t_put_val.20181026052328.csv", "r") as f_h:
    rows = [row.lstrip('[').split(", ")
            # For each line in the file (there's just one)
            for line in f_h.readlines()
            # Dont' want a blank line
            if not len(line) == 0
            # Split the line by trailing ']'s
            for row in line.split(']')
            # Don't want the last blank list
            if not len(row) == 0
            ]

# Print out all unique values
unique_values = set(item for row in rows for item in row)
[print(value) for value in unique_values];

# Output
with open("t_put.txt", 'w') as f_h:
    f_h.writelines('%s\n' % ', '.join(row) for row in rows)

推荐阅读