python - Python脚本在读取数据时将数据移动到下一行
问题描述
我写了一个代码,它计算一行中的分隔符数量,如果一行中存在的分隔符数量多于或少于每行的预期分隔符数量,那么该行将被打印并复制到另一个文件(Lines_FILE .txt) 进行分析。例如:
1,a,b,c,d
2,e,f,g,h
3,r,h,,u,j
上面第三行会被复制粘贴到一个新文件中。脚本是:
import string
### PLEASE DELETE THE FILE "Lines_FILE.txt" BEFORE RUNNING THIS SCRIPT
k = 0
linecount=0
with open('Mock.txt',encoding="latin1") as myfile: #input file name with extension also if required update file encoding
for line in myfile:
k=0
linecount=linecount+1
words = line.split()
for i in words:
for letter in i:
#k=line.count('"|"') #Unhash and Update delimiter and Text Qualifier if text qualifier present
k=line.count(',') #Unhash and Update delimiter if no text qualifier
print("Lines:",linecount)
print(k)
if(k!=94): #Update the number of delimiters present in the first line or the expected delimiters per line.
print(line)
f = open("Lines_FILE.txt","a")
f.write(line)
它工作正常,但突然我注意到一个文件,脚本选择了一个不是错误的行并将其粘贴到 Lines_FILE.txt 中。我注意到脚本已经选择了一行,并且在 Lines_FILE.txt 文件中,有一半的行被移到了下一行,而在实际数据中并非如此。这是行:
10804395,1,10/4/2018 6:45:27 PM,742443,23,2122804,OCT-18,10/4/2018,P,10/4/2018 6:44:34 PM,742443,,,2779094.44,,2779094.44,Reclass since no Physical inventory with Sanmina ,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
10804396,1,10/4/2018 6:45:27 PM,742443,23,2122805,OCT-18,10/4/2018,P,10/4/2018 6:44:35 PM,742443,,235530.26,,235530.26,,Fresh billing to Jabil against sanmina inventory movement reconciled to open POs from Jabil ,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
提取的线看起来像:
10804395,1,10/4/2018 6:45:27 PM,742443,23,2122804,OCT-18,10/4/2018,P,10/4/2018 6:44:34 PM,742443,,,2779094.44,,2779094.44,Reclass since no Physical inventory with Sanmina
,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
10804396,1,10/4/2018 6:45:27 PM,742443,23,2122805,OCT-18,10/4/2018,P,10/4/2018 6:44:35 PM,742443,,235530.26,,235530.26,,Fresh billing to Jabil against sanmina inventory movement reconciled to open POs from Jabil
,,,,,,,,,JE_AUTO_FILE_renurana_Sep-18_11_6720973_10-04-2018_104704_36,,,,,,,,,,,,,,,,,,Manual JE File Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2
在“with Sanmina”和“Jabil”文本之后,该行被推到下一行。我注意到几行相同的模式。我想这与那些文本之后的差距有关。总结一下这个问题,在读取数据时,脚本会中断几行并将其视为错误行。作为 python 新手,如果有人能指导我解决这个问题,那将是非常有帮助的。
解决方案
原因可能是您处理这两个文件的方式不同。第一个文件采用特定编码,第二个文件采用默认编码。我可以对您正在使用的脚本进行一些改进。
line_no = 1
with open("Mock.txt", "r", encoding="latin1") as infile:
with open("Lines_FILE.txt", "w", encoding="latin1") as outfile:
for line in infile:
delim_count = line.count(",")
print("Line: ", line_no)
if delim_count != 94:
print(line)
outfile.write(line)
这应该以相同的编码读取和写入文件。
推荐阅读
- reactjs - 如何在遗留 JSF Web 应用程序中使用 ReactJs?
- html - Html 值绑定
- java - int[] 和 Integer[] 之间的区别以及为什么它们被区别对待?
- c# - 使用 LINQ 将多个集合折叠成一个集合
- c# - 如何从 C# 中的方法创建窗口句柄
- ndepend - 使用 NDepend 查找所有未登录的方法 try catch
- spring-boot - Spring Boot bootRun 持续构建
- r - 在 R 中分析文本
- ajax - react redux-thunk项目中的模拟api返回未定义
- c# - PropertyChangedEventManager:AddHandler 与 AddListener