python - 将制表符分隔的 txt 文件转换为逗号分隔的 csv 文件
问题描述
我有这个文本文件,我想把它转换成逗号分隔的文件
antecedents consequents support confidence lift
------------- ------------- --------- ------------ ------
398 frozenset(['LM = 25', 'DIAB = n', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333
461 frozenset(['Age = 80', 'LM = 15', 'CHOL = 200']) frozenset(['AL = 1']) 0.25 1 1.33333
837 frozenset(['RCA = 80', 'Age = 80', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333
我应用了 pandas 和 csv 但它没有分隔列,它只分隔这样的原始数据
antecedents consequents support confidence lift
------------- ------------- --------- ------------ ------
" 398 frozenset(['LM = 25', 'DIAB = n', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333"
" 461 frozenset(['Age = 80', 'LM = 15', 'CHOL = 200']) frozenset(['AL = 1']) 0.25 1 1.33333"
" 837 frozenset(['RCA = 80', 'Age = 80', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333"
这是我使用的代码 1-
dataframe = pd.read_csv("/Users/user/PycharmProjects/Apriori /Rules.txt",delimiter="\t")
dataframe.to_csv("newDoc.csv", encoding='utf-8', index=False)
2-
txt_file = r"/Users/user/PycharmProjects/Apriori /Rules.txt"
csv_file = r"mycsv.csv"
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
out_csv.writerows(in_txt)
请问有什么帮助吗?
解决方案
鉴于这些行,您似乎可以使用正则表达式来获取五个字段。类似于以下内容:
import csv
import re
# looks like a consistent format given the example text:
line_re = re.compile('^\s*(\d+)\s+(frozenset.*?\))\s*(frozenset.*?\))\s*(\S+)\s+(\S+)\s+(\S+)$')
txt = '''antecedents consequents support confidence lift
------------- ------------- --------- ------------ ------
398 frozenset(['LM = 25', 'DIAB = n', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333
461 frozenset(['Age = 80', 'LM = 15', 'CHOL = 200']) frozenset(['AL = 1']) 0.25 1 1.33333
837 frozenset(['RCA = 80', 'Age = 80', 'SMOK = y']) frozenset(['AL = 1']) 0.25 1 1.33333'''
with open('mycsv.csv', 'w') as f:
writer = csv.writer(f)
for line in txt.splitlines():
mo = line_re.match(line)
if mo:
writer.writerow(mo.groups())
cat mycsv.csv
398,"frozenset(['LM = 25', 'DIAB = n', 'SMOK = y'])",frozenset(['AL = 1']),0.25,1,1.33333
461,"frozenset(['Age = 80', 'LM = 15', 'CHOL = 200'])",frozenset(['AL = 1']),0.25,1,1.33333
837,"frozenset(['RCA = 80', 'Age = 80', 'SMOK = y'])",frozenset(['AL = 1']),0.25,1,1.33333
推荐阅读
- c++ - 理解特定动态编程模式的问题
- javascript - 将 \r\n 替换为 < br /> 作为文本
- python-imaging-library - 使用 Python 测量图像的对比度
- javascript - retutn func() 或从 func() 返回没有 () 的 func,代码会发生什么?
- python - 如果元素本身是列表,则在 python 列表中移动元素
- javascript - 如何使用扩展运算符使用扩展运算符从对象数组中删除一个对象?
- excel - 索引和匹配但有重复的文本值
- php - PHP 是否检测到柏林的时区错误?
- excel - 在循环中多次显示用户表单
- html - 限制元素(子)溢出它的容器(父)宽度