python - 读取多个txt文件python
问题描述
我有 6000 个 txt 文件要在 python 中读取。我正在尝试阅读,但所有 txt 文件都是逐行的。
Subject: key dates and impact of upcoming sap implementation
over the next few weeks , project apollo and beyond will conduct its final sap
implementation ) this implementation will impact approximately 12 , 000 new
users plus all existing system users . sap brings a new dynamic to enron ,
enhancing the timely flow and sharing of specific project , human resources ,
procurement , and financial information across business units and across
continents .
this final implementation will retire multiple , disparate systems and replace
them with a common , integrated system encompassing many processes including
payroll , timekeeping ...
所以当我一个接一个地读取文件时,python将它分成几行(我知道那是可笑的)。最后,1 封邮件分成多行。我已经尝试read_csv
了所有 txt 文件,但 python 给出了错误ValueError: stat: path too long for Windows
。我不知道从现在开始我该怎么办。
我试过这个:
import glob
import errno
path =r'C:\Users\frknk\OneDrive\Masaüstü\enron6\emails\*.txt'
files = glob.glob(path)
for name in files:
try:
with open(name) as f:
for line in f:
print(line.split())
except IOError as exc:
if exc.errno != errno.EISDIR:
raise
['Subject:', 'key', 'dates', 'and', 'impact', 'of', 'upcoming', 'sap', 'implementation']
['over', 'the', 'next', 'few', 'weeks', ',', 'project', 'apollo', 'and', 'beyond', 'will', 'conduct', 'its', 'final', 'sap']
我需要这封电子邮件,但它是逐行分隔的。所以我想要的是每一行都由一封电子邮件表示。
解决方案
您可以将整个文本文件读入一个变量,然后根据需要进行操作。只需替换for line in f
为data=f.read()
.So,下面我将每个 txt 文件读入数据变量,然后拆分以获取由“”分隔的单词。希望这可以帮助。
for name in files:
try:
with open(name) as f:
data = f.read().replace("\n","")
print(data.split())
except IOError as exc:
if exc.errno != errno.EISDIR:
raise
输出如下所示:
['Subject:', 'key', 'dates', 'and', 'impact', 'of', 'upcoming', 'sap', 'implementationover', 'the', 'next', 'few', 'weeks', ',', 'project', 'apollo', 'and', 'beyond', 'will', 'conduct', 'its', 'final', 'sapimplementation', ')', 'this', 'implementation', 'will', 'impact', 'approximately', '12', ',', '000', 'newusers', 'plus', 'all', 'existing', 'system', 'users', '.', 'sap', 'brings', 'a', 'new', 'dynamic', 'to', 'enron', ',enhancing', 'the', 'timely', 'flow', 'and', 'sharing', 'of', 'specific', 'project', ',', 'human', 'resources', ',procurement', ',', 'and', 'financial', 'information', 'across', 'business', 'units', 'and', 'acrosscontinents', '.this', 'final', 'implementation', 'will', 'retire', 'multiple', ',', 'disparate', 'systems', 'and', 'replacethem', 'with', 'a', 'common', ',', 'integrated', 'system', 'encompassing', 'many', 'processes', 'includingpayroll', ',', 'timekeeping', '...']```
推荐阅读
- java - ActiveMQ Java STOMP 客户端收到 SocketTimeoutException
- r - 使用 hjust 调整 y-tick 标签不起作用
- vue.js - vuejs 使用 watch 检查计算的属性是否已更改
- css - TailwindCSS + Next.js 图像采用全高填充布局
- javascript - PHP弹出警报
- github - 如何将 github 存储库链接到 linode 服务器?
- javascript - 如何缩短 if/else 语句
- angular - Angular:组件内包含的模态窗口渲染
- apache-spark - Pyspark - 当列包含特定字符串时如何对列求和
- java - JPA 符合条件的前一行