首页 > 解决方案 > TypeError:“str”和“int”的实例之间不支持“>”。可能有 float 或 int

问题描述

我已经编译了以下代码来解析 pdf 文件。当我尝试获取列的长度(每个单元格在新列中针对它)时,它会引发错误。我无法摆脱它。怎么做。请指导。

from PyPDF2 import PdfFileReader
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
opn_pdf1= open("DEC7366.pdf","rb")
read_pdf= PdfFileReader(opn_pdf1)
book1 = pd.read_excel("Book1.xlsx")
for i in range(0,read_pdf.getNumPages()):
    page= read_pdf.getPage(i)
    data= page.extractText()
    nn = data.split("\n")
    df = pd.DataFrame(nn,columns=['Extract'])
    frames = (book1,df)
    book1 = pd.concat(frames)
    book1['Extract'] = book1['Extract'].replace(r'^\s+$', np.nan, regex=True)
    #print(book1)
book2 = book1.dropna(how="all")
book2['Duplicate'] = book2['Extract'].duplicated( keep="first")
filter = book2['Duplicate']== True
book2['Duplicate'].where(filter, inplace = True)
book2 = book2.dropna(how="any")
del book2['Duplicate']
book2['Duplicate']="Yes"
book2['Extract']=book2['Extract'].str.replace(r'^\s+$',"")
book2 = book2.dropna(how='any',subset=["Extract"])


#book2['Length'] = book2.select_dtypes(object).sum(axis=1).str.len()
book2['Length']= book2['Extract'].apply(str).str.replace(',', '/')
book2 = book2.where(book2['Length']>15)

book2['Duplicates'] = book2['Extract'].duplicated(keep="first")
filter = book2['Duplicates']== False
book2['Duplicates'].where(filter, inplace = True)
#del book2['Duplicates']
#book2['Duplicate']="Yes"
#del book2['Duplicates']
#book2 = book2.drop(columns=['Duplicates', 'Length',"Duplicate"])
book2 = book2.dropna(how="any")
book2.to_excel("AB1.xlsx",index=False)

TypeError: '>' not supported between instances of 'str' and 'int'

问题在于以下几行

book2['Length']= book2['Extract'].apply(str).str.replace(',', '/')
book2 = book2.where(book2['Length']>15)

标签: pythonpandaspypdf

解决方案


推荐阅读