首页 > 解决方案 > 添加数据时从数据框中删除列表

问题描述

从...开始:

import pandas as pd

lis1= [['apples'],['bananas','oranges','cinnamon'],['pears','juice']]
lis2= [['john'],['stacy'],['ron']]

pd.DataFrame({'fruits':lis1,'users':lis2})

                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

我想结束:

lis3= ['apples','bananas','oranges','cinnamon','pears','juice']
lis4= ['john','stacy','stacy','stacy','ron','ron']

pd.DataFrame({'fruits': lis3, 'users':lis4})

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

首先,我需要创建一个新的数据框,每个项目都位于自己的行中。其次,名称变量需要根据“水果”的数量重复。所以看这个例子,John 有一个水果,而 Stacy 有 5 个水果——所以在用户名下,Stacy 必须重复 5 次。

标签: pythonpandasdataframe

解决方案


itertools

from itertools import chain, product, starmap

pd.DataFrame(
    [*chain(*starmap(product, zip(df.fruits, df.users)))],
    columns=df.columns
)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

如果您只有 2 列,这也适用

pd.DataFrame(
    [*chain(*starmap(product, zip(*map(df.get, df))))],
    columns=df.columns
)

generator

def f(z):
  for A, B in z:
    for a in A:
      for b in B:
        yield (a, b)

pd.DataFrame([*f(zip(df.fruits, df.users))], columns=df.columns)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

推荐阅读