python - Read file of any extension using python
问题描述
I am trying to read contents of various file. Some of those files can be docx extension or pdf or xlsx extension as well.
I tried to use this code
for path in paths:
print(open(path, "r", encoding="utf8").read())
but it gave me following error
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-22-db6ea654fe14> in <module>
1 for path in paths:
----> 2 print(open(path, "r", encoding="utf8").read())
~\AppData\Local\Programs\Python\Python38\lib\codecs.py in decode(self, input, final)
320 # decode input (taking the buffer into account)
321 data = self.buffer + input
--> 322 (result, consumed) = self._buffer_decode(data, self.errors, final)
323 # keep undecoded input until the next call
324 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid continuation byte
解决方案
没有一种方法可以读取和公开任何类型的文件扩展名的功能。您将需要相应地处理每个扩展
有一些库可以帮助您阅读某些文件格式,因此我建议您使用它们。
import PyPDF2
for path in paths:
if path.endswith(".pdf"):
with open(path,'rb') as pdf_file:
pdf_read_obj = PyPDF2.PdfFileReader(pdf_file)
print(pdf_read_obj.read()) # This is pseudo code
elif path.endswith(".docx"):
# handle doc case
elif path.endsith("xlsx"):
# handle excel case
else: # Default to this case
try:
print(open(path, "r", encoding="utf8").read())
except:
print(f"Could not read file {path}")
推荐阅读
- c - 为什么在反转 SLL 时将 **head 发送到函数有效而 *head 在 C 中不起作用?
- v-tooltip - 为什么 v-tooltip 会导致内容消失?
- core-data - 核心数据 @FetchRequest SwiftUI
- netlogo - NetLogo:包含相似元素的配对列表
- python - Python进度条不显示
- c - 如何在矩阵中移动以使用最少的移动次数达到目标
- android - Recyclerview 项目颜色根据文本变化
- node.js - Express 返回空数组
- powershell - 在 PowerShell 上提升 / Sudo
- android - 地理围栏会降低功耗,而不是每 x 秒检查一次位置吗?