首页 > 解决方案 > PANDAS:删除索引值并在 Excel 中将一个单元格中的值拆分为不同的行

问题描述

我编写了这段代码来从一个 excel 文件中提取数据,并再次准备另一个带有提取的 ID、版本、阶段的 excel 文件。它提取它但错误地放置在excel中。它将值与索引一起放置,并将所有 ID 放在一个单元格中的所有版本中,所以我希望它位于 Id 列下方的不同行中
这是我的代码 [输入文件链接](https://drive .google.com/file/d/1ZrUbftMppFf8L3jgWl2i8bsgUavOHnkZ/view?usp=sharing ) [输出文件链接] ( https://drive.google.com/file/d/1BoUiICzRgkX3AN9OcRIEDnh9c3qKNL3l/view?usp=sharing )

import pandas as pd
cols = ['ID']
vals = ['Version']
phas = ['Phase']
id_index_list = []
id_list = []
verindex_list = []
version_list = []
phaseindex_list = []
phases_list = []
tolist = []
df_pver = pd.read_excel('pver.xlsm', 'PVP', header = None)

#Name of Project
dz= df_pver.iloc[[0],[0]]

#Finding ID
dy= df_pver.xs(0)

for id in dy:
    if 'ID' in str(id):
        #ID list Finding location of Column where ID exists
        zr= dy[dy==id].index.values
        for item in zr:
            if not item in id_index_list:
                id_index_list.append(item)
        mylist = [df_pver.xs(0)[id_index_list]]
        #finding the location of ID and moving 3 column aside to find the Version
        ze= dy[dy==id].index.values + 3
        for item in ze:
            if not item in verindex_list:
                verindex_list.append(item)
        mylist2 = [df_pver.xs(0)[verindex_list]]
        #findind the phase the project is in
        zp= dy[dy==id].index.values + 1
        for item in zp:
            if not item in phaseindex_list:
                phaseindex_list.append(item)
        mylist3 = [df_pver.xs(1)[phaseindex_list]]


id_list.append(mylist)
version_list.append(mylist2)
phases_list.append(mylist3)

a = pd.DataFrame(id_list, columns = cols)
b = pd.DataFrame(version_list, columns = vals)
c = pd.DataFrame(phases_list, columns = phas)



x = pd.concat([a,b,c], axis=1, sort=False)
x.stack().str.split(' ', expand=True).reset_index(level=0, drop=True).T
x.to_excel("df2.xlsx", index=False)

此代码正在查找我需要但错误地给出输出的数据。

excel中的输出是

   id                  version                     required  
0  17 X 18 Y 22 Z     20 1  21 2 24 3            18 gantt 19 Pie 23 ipex

但我希望它像这样,没有索引也分成不同的行,而不是全部在一个单元格中。

  id      version     required
0 X        1           gantt
1 Y        2           Pie
2 Z        3           ipex

有人可以帮忙吗

标签: pythonpandas

解决方案


我假设您的列已经按 id、阶段和版本的顺序排列。索引 3 之后也没有数字ID:

# read excel
df = pd.read_excel('pver.xlsx')
# find the columns that start with ID, transpose, reset the index and split on colon
ids = df[df.columns[df.columns.str.startswith('ID')]].T.reset_index()['index'].str.split(':.|:', expand=True)[1].to_numpy()
# find the columns that start with QA and transpose
phase = df[df.columns[df.columns.str.startswith('QA')]].T[0].to_numpy()
# find the columns that start with V or OEM, and transpose
v = df[df.columns[df.columns.str.startswith('V') | df.columns.str.startswith('OEM SW')]].T.index
# vstack and to pd.dataFrame
new_df = pd.DataFrame(np.vstack([ids,v,phase])).T
# name columns
new_df.columns = ['ID', 'Version', 'Phase']

           ID        Version               Phase
0     1907839           V100  during development
1    01907820           V110  during development
2   189634226           V120  during development
3                       V130  during development
4           1           V200       Raw Container
5           2           V220                 NaN
6           3    OEM SW name                 NaN
7           4  OEM SW name.1                 NaN
8           5  OEM SW name.2                 NaN
9           6  OEM SW name.3                 NaN
10          7  OEM SW name.4                 NaN
11          8  OEM SW name.5                 NaN
12          9  OEM SW name.6                 NaN
13         10  OEM SW name.7                 NaN
14         11  OEM SW name.8                 NaN
15         12  OEM SW name.9                 NaN

推荐阅读