首页 > 解决方案 > 如何“取消堆叠”数据框列中的所有列表并成对组合结果?

问题描述

我有一个包含 5 列的数据框,每列包含可变长度的列表。这是我的数据框中的一行的样子:

               A                                         B
1 [aircrafts, they, agreement, airplane]  [are, built, made, built] 

现在我想“解压”或“解压”这些列表,以便每个单元格只包含一个值(一个单词)。在拆包过程中,一列单元格中的单词应与下一列中的相应值成对组合。结果将是:

        A            B
   1  aircrafts      are 
   2  they           built
   3  agreement      made
   4  airplane       built

作为参考,我的完整数据框如下所示:

   obj                       rel1  \
0                                 [Boeing]                    [sells]   
1   [aircrafts, they, agreement, airplane]  [are, built, made, built]   
2                       [exception, these]                 [are, are]   
3                        [sales, contract]       [regulated, consist]   
4                               [contract]               [stipulates]   
5                            [acquisition]                      [has]   
6                               [contract]                  [managed]   
7                               [employee]                      [act]   
8                    [salesperson, Boeing]          [change, ensures]   
9                     [airlines, airlines]            [related, have]   
10                                [Boeing]                     [keep]   

                  dep1                    rel2  \
0          [aircrafts]                    [to]   
1                   []              [on, with]   
2                   []           [of, of, for]   
3                   []  [by, of, with, of, of]   
4           [elements]            [across, as]   
5            [details]       [of, as, of, for]   
6                   []                    [by]   
7                   []               [as, for]   
8                   []  [Given, of, over, for]   
9            [company]               [to, for]   
10  [track, aircrafts]   [of, between, to, of]   

                                                 dep2  
0                                         [companies]  
1                                  [demand, customer]  
2                            [airplanes, scope, case]  
3   [means, contracts, companies, acquisitions, ai...  
4                          [acquisitions, conditions]  
5                [airplane, model, airplane, options]  
6                                       [salesperson]  
7                            [salesperson, contracts]  
8                     [term, contracts, time, client]  
9                                    [other, example]  
10        [relationships, companies, airlines, buyer]        

如何在 python 中执行“解包”和重新排列操作?如果这些操作可以在数据帧本身上执行,那就太好了。如果这被证明是困难或不可能的,有没有办法在将列表中的数据组合成数据框之前重新排列它们?

非常感谢您的任何帮助或建议。

标签: pythonlistpandasdataframe

解决方案


我相信您希望能够将行作为句子阅读

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(
    obj=[['Boeing'], ['aircrafts', 'they', 'agreement', 'airplane'], ['exception', 'these']],
    rel1=[['sells'], ['are', 'built',  'made',  'built'],['are', 'are']],
    dep1=[['aircrafts'], [], []],
    rel2=[['to'], ['on', 'with'], ['of', 'of', 'for']]
))

# Output dataframe
out = pd.DataFrame()

# Keep track of which set of rows we have already appended to the dataframe
row_counter = 0

# Loop through each row in the input dataframe
for row, value in df.iterrows():

    # Get the max len of each list in this row
    rows_needed = value.map(lambda x: len(x)).max()

    for i in range(rows_needed):

        # Set a name for this row (numeric)
        new_row = pd.Series(name=row_counter+i)

        # Loop through each header and create a single row per item in the list
        for header in value.index:

            # Find if there will be a value here or null
            this_value = np.nan
            if i < len(df.loc[row, header]):
                this_value = df.loc[row, header][i]

            # Add this single result to a series for this row
            new_row = new_row.set_value(header, this_value)

        # Add this row series to the full dataframe
        out = out.append(new_row)
    row_counter += rows_needed

out
Out[1]:
        dep1        obj   rel1  rel2
0  aircrafts     Boeing  sells    to
1        NaN  aircrafts    are    on
2        NaN       they  built  with
3        NaN  agreement   made   NaN
4        NaN   airplane  built   NaN
5        NaN  exception    are    of
6        NaN      these    are    of
7        NaN        NaN    NaN   for

order = ['obj', 'rel1', 'dep1', 'rel2']
out = out[order]
out
Out[2]:
         obj   rel1       dep1  rel2
0     Boeing  sells  aircrafts    to
1  aircrafts    are        NaN    on
2       they  built        NaN  with
3  agreement   made        NaN   NaN
4   airplane  built        NaN   NaN
5  exception    are        NaN    of
6      these    are        NaN    of
7        NaN    NaN        NaN   for

推荐阅读