python - 使用 Pandas read_excel() 下载 xlsx 文件的完整行失败
问题描述
该文件应该有数千行。但是在下面使用它只会返回数据框中的前几行
档案 https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx
失败的例子
import pandas as pd
url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'
df = pd.read_excel(url, engine='openpyxl', header=2, usecols='A:D', verbose=True)
print(df.shape)
# output - only 5 rows
Reading sheet 0
(5, 4)
工作示例
同一个文件。首先下载它,在 Excel 中打开,修改文本并保存(没有更改格式并保留 xlsx),然后使用 read_excel() 从文件打开
url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'
path = os.path.join(os.path.dirname(__file__), 'download')
wget.download(url, out=path)
file = os.path.join(path, 'ListOfSecurities.xlsx')
# open to edit and then save in Excel
df = pd.read_excel(file, engine='openpyxl', header=2, usecols='A:D', verbose=True)
print(df.shape)
# output
Reading sheet 0
(17490, 4)
解决方案
更新:基于xlrd
不可行使用的上下文更改代码
import pandas as pd
import os
import wget
url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'
path = os.path.join(os.path.dirname(__file__), 'download')
wget.download(url, out=path)
filename = os.path.join(path, 'ListOfSecurities.xlsx')
from openpyxl import load_workbook
excel_file = load_workbook(filename)
sheet = excel_file["ListOfSecurities"]
sheet.delete_cols(5,21) # Use only Cols A:D
data = sheet.values
cols = next(data) # Skip row 0
cols = next(data) # Skip row 1
cols = next(data)[0:4] # Cols A:D
df = pd.DataFrame(data, columns=cols)
print(df.shape)
我将 excel 引擎更改为使用xlrd
pandas 中的默认引擎 ( ) 并且以下代码有效。
import pandas as pd
import os
import wget
url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'
path = os.path.join(os.path.dirname(__file__), 'download')
wget.download(url, out=path)
filename = os.path.join(path, 'ListOfSecurities.xlsx')
df = pd.read_excel(filename, header=2, usecols='A:D', verbose=True)
print(df.shape)
输出中的一个不一致之处是它显示的行数减少了 4 行:
Reading sheet 0
(17486, 4)
推荐阅读
- c# - 如何为隐藏的事件处理程序模拟 SetupAdd?
- amazon-web-services - Cloudformation 中的 Elastic Beanstalk 堆栈名称
- c# - 进度条达到最大值但动画未完成
- python - 如何使用 PyGame 绘制矩形轮廓(未填充)?
- c - C中静态函数的正确用法是什么?
- javascript - 调用function2的JavaScript调用函数与直接调用function2一样吗?
- xml - 是否有更快的方法来存储/读取 XML 文档中的数据?
- javascript - Array Angular ReactiveForms 中的数组
- python - 拒绝库的权限 - PPTK 查看器
- mysql - 根据一列的值将表分解为多列