首页 > 解决方案 > 检查列中是否存在子序列

问题描述

我需要一些帮助来解决使用 python 和 pandas 作为数据框的问题。

我有 2 列,即“数据”和“完整数据”,如果“数据”的任何子集存在于“完整数据”中,那么我需要在名为“新发现”的新列中匹配子集值

我需要输出新列'new_finding:

数据 全数据 新发现
123456 123456789 [123456]
345643 456432345876 [456,345,43]

标签: pythonpandasdataframe

解决方案


看看这是否适合你

import re
from itertools import permutations

def combs(letters):
    for n in range(1, len(letters)+1):
        yield from map(''.join, permutations(letters, n))
df['new_finding'] = df.apply(lambda x: ([re.findall(comb,str(x['full_data'])) for comb in combs(str(x['data']))]),axis=1)
df['new_finding'] = df['new_finding'].apply(lambda row:[x for x in row if x != []])
df['new_finding'] = df['new_finding'].apply(lambda row:[list(x) for x in set(tuple(x) for x in row)])
df['new_finding'] = df['new_finding'].apply(lambda row:[item[0] for item in row])
df

输出

data    full_data   new_finding
123456  123456789   [45, 1234, 6, 23, 123456, 4, 123, 3456, 12, 5, 3, 12345, 23456, 1, 56, 2345, 234, 345, 2, 34, 456]
345643  456432345876    [345, 5, 564, 45, 45643, 6, 4, 34, 643, 43, 56, 4564, 5643, 456, 3, 64]

推荐阅读