首页 > 解决方案 > 如何加入熊猫列表?

问题描述

假设我有这两个数据框:

df1:

ID    Strings
1     'hello, how are you?'
2     'I like the red one.'
3     'You? I think so.'
df2:

range      Strings
[1]        'hello, how are you?'
[2,3]      'I like the red one. You? I think so.'

我的目标是获取 df1 中的句子并将它们分组,以便它们与 df2 匹配。为此,我设法找到了一种方法来标记我希望他们所在的组,所以在这个例子中,1 是独立的,但句子 2 和 3 需要结合起来。

我可以通过加入来做到这一点吗?

标签: pythonpandasdataframejoinmerge

解决方案


假设你有你的加入列表,你可以做这样的事情:

df = pd.DataFrame(['hello, how are you?','I like the red one.', 'You? I think so.'], columns=['sentence'])
 
# rows 1 and 2 are to be merged
join = [[0], [1,2]]

# check if the indexes are in the list items
df['joincol'] = pd.Series(df.index).apply(lambda x: [x in j for j in join]).astype(str)

df

sentence    joincol
0   hello, how are you? [True, False] # this is your grouping column
1   I like the red one. [False, True]
2   You? I think so.    [False, True]


# group by and keep uniques
df.groupby('joincol')['sentence'].transform(lambda x: ' '.join(x)).drop_duplicates()

# result

0                     hello, how are you?
1    I like the red one. You? I think so.
Name: sentence, dtype: object

推荐阅读