首页 > 解决方案 > 如何将所有docx数据放入python中的单独数据框列

问题描述

我在stackoverflow中没有找到任何关于这个问题的信息,所以请耐心等待我,我不知道如何解决这个问题,请多多包涵。

下面是我的代码:

v_doc 

for root, dirs, files in os.walk(paths):
    for t in files:
        if t.endswith('.xlsx'):   
            v_doc.append(Document(t))

            # say like, there are 3 docx which contains simple sentences. how to put 
            #those sentences into seperate dataframe columns for each docx sentences ? i have many docx. n number of docx

示例文档:

docx1 包含:

Hello guys how are you all, hope you guys doing good.

docx2 包含:

I dont know what to write here

docx3 包含:

We are strong together ! do we ?

预期输出:

dataframe:
column1                                                 column2
#Hello guys how are you all, hope you guys doing good.  #I don't know what to write here
column3
#We are strong together ! do we ?

希望我能得到一些回应。先感谢您。

标签: pythonarraysregexdataframenumpy

解决方案


戈奇亚:

import os
import docx

dataframe = {}

def get_files(extension, location):
    v_doc = []

    for root, dirs, files in os.walk(location):
        for t in files:
            if t.endswith(extension):   
                v_doc.append(t)
    return v_doc

file_list = get_files('.docx', '.')
index = 0
for file in file_list:
    index += 1
    doc = docx.Document(file)
    column_label = f'column{index}'
    data_content = doc.paragraphs[0].text
    dataframe = {column_label: data_content}

print(dataframe)

推荐阅读