首页 > 解决方案 > python数据框根据单元格选择行,单元格是列表中的列表

问题描述

我想根据一列列表选择数据框的行。我想根据单独的独立列表的交集选择一行。我希望有一种更优雅的方式来做到这一点,因为我已经花了几个小时进行研究,但我的解决方案仍然不完整。

import pandas as pd  

# initialize list of lists  
data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

# Create the pandas DataFrame  
df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

print(df ) 

# how do I select rows from the dataframe that match multiple values?
# doing this with a single value is be easy
desired_name = ['DP', 'Greedy']
small_set = df[df['Name'].isin(desired_name)]
print(small_set)

# what I really want to do is something like
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].isin(desired_alpha)]
print(small_set)

# The only thing I've been able to figure out is below, but it's ugly
# and painful so guessing there is a better way

set_mask = df['Alpha'].apply(lambda x: list(filter(lambda y: y in x, desired_alpha))  )
set_mask = set_mask.to_frame()

# convert the non-empty arrays to True and other to False

set_mask = set_mask.mask(set_mask['Alpha'].str.len() != 0, True)
set_mask = set_mask.mask(set_mask['Alpha'].str.len() == 0, False)

# Then use the set_mask as a mask like df[set_mask] but that doesn't work since
# the values in set_mask are not boolean -- which is a different problem

标签: pythondataframearraylistrows

解决方案


我认为不那么“痛苦”:

# This replaces "what I really want to do is something like" section, entire solution
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))]
print(small_set)

解释:

any([y in x for y in desired_alpha])'Alpha'列 ( x) 中获取一个值,并检查 中的任何值是否desired_alpha出现在x.

通过将其作为函数应用于'Alpha'df: 列df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha])),您可以获得一系列bool值 - 然后可用于选择您所追求的解决方案。

写成代码长格式:

import pandas as pd

data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']], 
        ['DS', 'Stack', 9, ['d', 'e', 'c']], 
        ['DS', 'Queue', 7, ['f', 'g', 'i']], 
        ['Algo', 'Greedy', 8, ['r', 's', 'c']], 
        ['Algo', 'DP', 6, ['t', 'r', 'g']], 
        ['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]  

df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])  

desired = ['c', 'i']


def contains_desired(x):
    global desired
    return any([y in x for y in desired])


selection = df['Alpha'].apply(contains_desired)
small_set = df[selection]
print(small_set)

输出:

  Category         Name  Marks      Alpha
0       DS  Linked_list     10  [a, b, c]
1       DS        Stack      9  [d, e, c]
2       DS        Queue      7  [f, g, i]
3     Algo       Greedy      8  [r, s, c]

注意:如果您只想匹配同时具有和的选项,请更改any()为,但由于您的示例数据,我认为情况并非如此。all()ic


推荐阅读