首页 > 解决方案 > 通过 Python 提取 PowerPoint 文本属性

问题描述

我试图在 PowerPoint 中提取与我的文本相关的属性,并且得到奇怪的输出...... shape.fill 的输出与预期不符。我也很想找到其他属性,如 shape.font 和形状的位置 - 这可能吗?

问题:

f = shape.fill
Output: <pptx.dml.fill.FillFormat object at 0x00000215C4D6DD90>

代码:

mylist = []
mylist2 = []
mylist3 = []
mylist4 = []
mylist5 = []
mylist6 = []
mylist7 = []

for eachfile in glob.glob(direct):
    s = 1
    file = os.path.basename(eachfile)
    try:
        prs = Presentation(eachfile)
        for slide in prs.slides:
            for shape in slide.shapes:
                if hasattr(shape, "text"):            
                    x = nltk.word_tokenize(shape.text)
                    t = shape.text
                    f = shape.fill
                    print(f)
                    mylist4.append(file)
                    mylist5.append(t)
                    mylist7.append(f)
                    mylist6.append('Slide: ' + str(s))
    #                x = shape.text.split() #looks for words with punctuation included
                    for word in x:
                        word = word.lower()
                        if word in terms:
                            mylist.append("Slide " + str(s))
                            mylist2.append(file)
                            mylist3.append(word)

            s = s + 1
    except:
        pass

#mylist = list(dict.fromkeys(mylist))
d = {'FileName':mylist2,'Slide':mylist, 'Match':mylist3}
d2 = {'FileName':mylist4, 'Slide':mylist6, 'Text':mylist5, 'Color':mylist7}
search = phrases + terms
d3 = {'Text':search}
df = pd.DataFrame(d)
df = df.drop_duplicates()

标签: pythonpowerpoint

解决方案


<pptx.dml.fill.FillFormat object at 0x00000215C4D6DD90>是一个python对象。您需要查找这些类型对象的文档并使用其属性函数以从中获取信息。

对于这种类型的对象,我能找到的唯一文档是this,虽然这不是“正常”的,而只是源代码。您可以使用的函数写在FillFormat类内部,从back_color(self, ...)


推荐阅读