python - TypeError:“str”和“int”的实例之间不支持“>”。可能有 float 或 int
问题描述
我已经编译了以下代码来解析 pdf 文件。当我尝试获取列的长度(每个单元格在新列中针对它)时,它会引发错误。我无法摆脱它。怎么做。请指导。
from PyPDF2 import PdfFileReader
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
opn_pdf1= open("DEC7366.pdf","rb")
read_pdf= PdfFileReader(opn_pdf1)
book1 = pd.read_excel("Book1.xlsx")
for i in range(0,read_pdf.getNumPages()):
page= read_pdf.getPage(i)
data= page.extractText()
nn = data.split("\n")
df = pd.DataFrame(nn,columns=['Extract'])
frames = (book1,df)
book1 = pd.concat(frames)
book1['Extract'] = book1['Extract'].replace(r'^\s+$', np.nan, regex=True)
#print(book1)
book2 = book1.dropna(how="all")
book2['Duplicate'] = book2['Extract'].duplicated( keep="first")
filter = book2['Duplicate']== True
book2['Duplicate'].where(filter, inplace = True)
book2 = book2.dropna(how="any")
del book2['Duplicate']
book2['Duplicate']="Yes"
book2['Extract']=book2['Extract'].str.replace(r'^\s+$',"")
book2 = book2.dropna(how='any',subset=["Extract"])
#book2['Length'] = book2.select_dtypes(object).sum(axis=1).str.len()
book2['Length']= book2['Extract'].apply(str).str.replace(',', '/')
book2 = book2.where(book2['Length']>15)
book2['Duplicates'] = book2['Extract'].duplicated(keep="first")
filter = book2['Duplicates']== False
book2['Duplicates'].where(filter, inplace = True)
#del book2['Duplicates']
#book2['Duplicate']="Yes"
#del book2['Duplicates']
#book2 = book2.drop(columns=['Duplicates', 'Length',"Duplicate"])
book2 = book2.dropna(how="any")
book2.to_excel("AB1.xlsx",index=False)
TypeError: '>' not supported between instances of 'str' and 'int'
问题在于以下几行
book2['Length']= book2['Extract'].apply(str).str.replace(',', '/')
book2 = book2.where(book2['Length']>15)
解决方案
推荐阅读
- python - TensorFlow while_loop() 的非确定性行为
- c# - 异步编程试图了解 await 是如何工作的
- php - 显示更多日期时间
- jenkins - 如何将 sh 脚本中设置的变量传递给后续的 Jenkins 流水线步骤
- android - 如何在检查标志时禁用安卓应用程序中的所有编辑文本?
- windows - 使用新的身份验证方法扩展 Windows Hello
- c# - .Net 核心 Web 应用程序在 localhost 上工作但不在 IIS 服务器上运行
- mysql - #Query 以根据特定条件填充不同的结果。(挑战者查询)
- windows - 高级休息客户端 - 不显示响应
- c++ - for循环c ++的问题