首页 > 解决方案 > 如何从每列由正则表达式创建的列表中创建 DataFrame

问题描述

我有一个这样的清单:

lst = ['2021_01_21__11_10_54_1__13928_snapshot.jpg',
       '2021_01_21__12_27_44_1__13934_snapshot.jpg',
       '2021_01_21__11_11_08_2__13928_snapshot.jpg',
       '2021_01_21__12_27_56_2__13934_snapshot.jpg',
       '2021_01_21__11_11_19_3__13928_snapshot.jpg',
       '2021_01_21__12_28_08_3__13934_snapshot.jpg']

我想创建一个 DataFrame 以便每一列都表示为:

def by_number(path):
    base_name = os.path.basename(path)
    return re.findall('[\_]{2}(\d{5})',lst)

行将由以下形式表示:

def by_index(path):
    base_name = os.path.basename(path)
    return re.findall('\_(\d)[\_]{2}',lst)

所以最终我会得到一个看起来像这样的 DataFrame:

在此处输入图像描述

标签: python-3.xregexdataframe

解决方案


name_list = ['2021_01_21__11_10_54_1__13928_snapshot.jpg',
       '2021_01_21__12_27_44_1__13934_snapshot.jpg',
       '2021_01_21__11_11_08_2__13928_snapshot.jpg',
       '2021_01_21__12_27_56_2__13934_snapshot.jpg',
       '2021_01_21__11_11_19_3__13928_snapshot.jpg',
       '2021_01_21__12_28_08_3__13934_snapshot.jpg']

import re
import pandas as pd

df = pd.DataFrame([[0]], columns=['count']) # initialize dataframe

for name in name_list:
    count = re.search('\_(\d)[\_]{2}',name).group(1)
    col = re.search('[\_]{2}(\d{5})',name).group(1)
    if ((df['count'] == count)).any():
        df.loc[df['count'] == count, col] = name
    else:
        new_row = pd.DataFrame([[count,name]], columns=['count',col])
        df = df.append(new_row)
df.set_index('count', inplace=True)
print(df)

结果


推荐阅读