首页 > 解决方案 > 将多个文本文件转换为 csv 以创建标记数据集

问题描述

我在多个文件夹中有文本文件(文件夹名称是类别/标签的名称)。我想生成一个 csv 文件(数据集),该文件也有一列作为该类别文本的标签(文件夹名称)。

import csv
import os

folder = os.path.dirname("/home/jaideep/Desktop/folder/ML DS/Csv/Datasets/")
folder_list = os.listdir(folder)

with open("/home/jaideep/Desktop/folder/ML DS/Csv/data.csv", "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Label', 'Email','Message'])
    for f in folder_list:
        file_list = os.listdir(folder+"/"+f+"/")
        print(file_list)
        for file in file_list:
            with open(file, "r")  as infile:
                contents = infile.read()
                outfile.write(f+',')
                outfile.write(contents)

但我越来越

File "/home/jaideep/Desktop/folder/ML DS/Csv/Main.py", line 15, in <module>
    with open(file, "r")  as infile:

FileNotFoundError: [Errno 2] No such file or directory: 'file2.txt'

我知道以前有人问过类似的问题,但我无法为我的问题提交解决方案。任何帮助将不胜感激,谢谢。

标签: pythonpandasdatasetfile-handling

解决方案


os.listdir仅列出目录的文件名,因此您需要重建路径。

您可能想检查一下glob

这个版本应该可以解决你的问题。

import csv
import os

folder = os.path.dirname("/home/jaideep/Desktop/folder/ML DS/Csv/Datasets/")
folder_list = os.listdir(folder)

with open("/home/jaideep/Desktop/folder/ML DS/Csv/data.csv", "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Label', 'Email','Message'])
    for f in folder_list:
        file_list = os.listdir(os.path.join(folder, f))
        print(file_list)
        for file in file_list:
            with open(os.path.join(folder, f, file), "r")  as infile:
                contents = infile.read()
                outfile.write(f+',')
                outfile.write(contents)

推荐阅读