python - python数据框根据单元格选择行,单元格是列表中的列表
问题描述
我想根据一列列表选择数据框的行。我想根据单独的独立列表的交集选择一行。我希望有一种更优雅的方式来做到这一点,因为我已经花了几个小时进行研究,但我的解决方案仍然不完整。
import pandas as pd
# initialize list of lists
data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']],
['DS', 'Stack', 9, ['d', 'e', 'c']],
['DS', 'Queue', 7, ['f', 'g', 'i']],
['Algo', 'Greedy', 8, ['r', 's', 'c']],
['Algo', 'DP', 6, ['t', 'r', 'g']],
['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])
print(df )
# how do I select rows from the dataframe that match multiple values?
# doing this with a single value is be easy
desired_name = ['DP', 'Greedy']
small_set = df[df['Name'].isin(desired_name)]
print(small_set)
# what I really want to do is something like
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].isin(desired_alpha)]
print(small_set)
# The only thing I've been able to figure out is below, but it's ugly
# and painful so guessing there is a better way
set_mask = df['Alpha'].apply(lambda x: list(filter(lambda y: y in x, desired_alpha)) )
set_mask = set_mask.to_frame()
# convert the non-empty arrays to True and other to False
set_mask = set_mask.mask(set_mask['Alpha'].str.len() != 0, True)
set_mask = set_mask.mask(set_mask['Alpha'].str.len() == 0, False)
# Then use the set_mask as a mask like df[set_mask] but that doesn't work since
# the values in set_mask are not boolean -- which is a different problem
解决方案
我认为不那么“痛苦”:
# This replaces "what I really want to do is something like" section, entire solution
desired_alpha = ['c', 'i']
small_set = df[df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))]
print(small_set)
解释:
any([y in x for y in desired_alpha])
从'Alpha'
列 ( x
) 中获取一个值,并检查 中的任何值是否desired_alpha
出现在x
.
通过将其作为函数应用于'Alpha'
df: 列df['Alpha'].apply(lambda x: any([y in x for y in desired_alpha]))
,您可以获得一系列bool
值 - 然后可用于选择您所追求的解决方案。
写成代码长格式:
import pandas as pd
data = [['DS', 'Linked_list', 10, ['a', 'b', 'c']],
['DS', 'Stack', 9, ['d', 'e', 'c']],
['DS', 'Queue', 7, ['f', 'g', 'i']],
['Algo', 'Greedy', 8, ['r', 's', 'c']],
['Algo', 'DP', 6, ['t', 'r', 'g']],
['Algo', 'BackTrack', 5, ['b', 'd', 'g']], ]
df = pd.DataFrame(data, columns = ['Category', 'Name', 'Marks', 'Alpha'])
desired = ['c', 'i']
def contains_desired(x):
global desired
return any([y in x for y in desired])
selection = df['Alpha'].apply(contains_desired)
small_set = df[selection]
print(small_set)
输出:
Category Name Marks Alpha
0 DS Linked_list 10 [a, b, c]
1 DS Stack 9 [d, e, c]
2 DS Queue 7 [f, g, i]
3 Algo Greedy 8 [r, s, c]
注意:如果您只想匹配同时具有和的选项,请更改any()
为,但由于您的示例数据,我认为情况并非如此。all()
i
c
推荐阅读
- c# - Blazor datalist onchange 事件未触发
- firebase - 当应用程序在某些设备上被终止时,FCM 通知未显示
- apache - ServerRoot 必须是 Mac 上的有效目录
- javascript - 用 JS 格式化 - input_start.map 不是函数
- c# - C#字典变量如何存储在内存中?
- c# - 使用 HTTPHandler 和 Response.Redirect 查看 PDF 时加载资源失败:net::ERR_EMPTY_RESPONSE
- azure - Azure Functions 3 和 [FromBody] 模型绑定
- c++ - LLVM opt 找不到函数传递
- r - 世界银行API查询
- oracle - 将变量绑定到过程中的输入参数