python - How to find the Font Size of every paragraph of PDF file using python code?
问题描述
Right now i am Working on a project in which i have to find the font size of every paragraph in that PDF file. i have tried various python libraries like fitz, PyPDF2, pdfrw, pdfminer, pdfreader. all the libraries fetch the text data but i don't know how to fetch the font size of the paragraphs. thanks in advance..your help is appreciated.
i have tried this but failed to get font size.
import fitz
filepath = '/home/user/Downloads/abc.pdf'
text = ''
with fitz.open(filepath ) as doc:
for page in doc:
text+= page.getText()
print(text)
解决方案
I got the solution from pdfminer. The python code for the same is given below.
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
import os
path=r'/path/to/pdf'
Extract_Data=[]
for page_layout in extract_pages(path):
for element in page_layout:
if isinstance(element, LTTextContainer):
for text_line in element:
for character in text_line:
if isinstance(character, LTChar):
Font_size=character.size
Extract_Data.append([Font_size,(element.get_text())])
推荐阅读
- python - cv2.line 不会画线
- reactjs - 应该如何使用 useDispatch 来强制执行类型检查?
- python - 在 django 中自定义用户模型和表单
- c# - VS2019 作为管理员与作为用户之间的不同构建行为
- java - 我无法使用 sharedPreferences android studio 获取第一个字符串
- xml - 从 XML 文档中提取子元素 w
- java - 如何在 Spring Boot API 测试中模拟 Spring 5 WebClient
- ruby - 如何知道超级方法当前正在运行哪个类/模块?
- sql - 具有分组依据的存储过程
- python - 重采样数据帧时的偏移问题