首页 > 解决方案 > 使用python在MS Word中查找字符串的行号

问题描述

我试图在 MS Word 文件中找到给定字符串所在的行号。我做了一些事情,我能够在行号中找到单词,但是行号与 MS Word 文件中的实际行号不匹配。原因是python在文件中遇到“\n”时会计算行号。因此,如果我们继续输入而没有“\n”,它将把它当作单行。此外,如果字体大小增加或减少,实际 MS Word 文件中的行号会受到影响,但对于 python 则不会。以下是我尝试过的。

import docx2txt

path = "C:/Users/vivek/OneDrive/Documents/Test/TestDoc.docx"
my_text = docx2txt.process(path)
for num, string in enumerate(my_text.split("\n")):
    print(num,repr(string))

下面是我的输出:

C:\Users\vivek\AppData\Local\Programs\Python\Python38-32\python.exe C:/Users/vivek/PycharmProjects/DocumentSearch/word.py
0 'FRESNO\xa0—\xa0'
1 ''
2 'In Oklahoma, Sequoyah Quinton, a storm chaser and member of the Cherokee Nation, went outside, dropped to his knees and prayed for something to stop the destruction of the sequoia trees.'
3 ''
4 'In New York, Gabrielle Foreman, a professor, called her mother in Chicago. They spoke of a man being evicted who wailed in grief as he gave up his dog and about a young Black woman shot by police, and then discussed fire threatening the Giant Forest in Sequoia National Park.'
5 ''
6 'Foreman told her mother, “I have to get off the phone. I’m registering this in my body.” Then she prayed: “Send energy to the trees. They’re the witnesses to everything and they literally allow us to breathe.”'
7 ''
8 'In San Diego, Katie Ohlin, an early witness to spiraling sequoia mortality, watched this week as people worldwide poured out their alarm and their connection to the trees after seeing photographs of fire workers wrapping the famed Giant Forest, including the largest living tree on Earth, in silver retardant foil.'
9 ''

但这就是它在文档中的实际外观。[1]:https ://i.stack.imgur.com/d36R2.png

标签: pythonfilems-wordword

解决方案


推荐阅读