首页 > 解决方案 > 在python的数据框中找不到值

问题描述

原始数据框包含所有 3 列,即name,descriptionspecialties列。

我想输入公司名称,将其专业与所有其他公司的专业进行比较,在比较过程中,每当我找到匹配项时,我想打印并保存找到匹配项的所有详细信息。

df_descrip = df_original[['name', 'description']]
df_spec  = df_original[['name','specialties']]
INPUT ='TOTAL'
all_names = df_original['name']
df_original = df_original.set_index('name', drop = False)
columns = df_original.columns
for index, row in df_original.iterrows():
    if row['name'] == INPUT:
        specialties_input = df_original.loc[INPUT,'specialties']
        print('INPUT SPECIALTIES: ', specialties_input)

for spec in specialties_input:
    for item in df_spec['specialties']:
        if spec in item:
            # here I want to display details of a match

注意:假设如果我输入公司名称“TOTAL”并且它有 5 个专业(s1、s2、s3、s4、s5),我会将它们与我的数据框中所有公司的专业进行比较。假设我在专业中找到了一个匹配即 s3,我怎样才能获得匹配公司的名称?

标签: python-3.xpandasdataframe

解决方案


您提供的数据不是很干净或可复制,所以我在这里创建了示例数据。

假设您可以通过 拆分专业',',在分析中使用列表和集合比使用字符串更简单。

# Sample Data
df = pd.DataFrame({'description': ['d1', 'd2', 'd3'], 
                   'specialties': ['s1,s2,s3', 's3,s4,s5,s6', 's5,s6,s7']}, 
                  index=['name1', 'name2', 'name3'])

# Sample Input
name_lookup = 'name3'

tgt_set = set(df.loc[name_lookup, 'specialties'].split(','))
intersection = df['specialties'].str.split(',').apply(lambda x: tgt_set.intersection(x))
match = intersection != set() # Remove companies with 0 matches

# Output:

intersection[match] # will deliver the specialties they have in common

df[match] # will return the data only on the ones that have at lest one specialty in common

推荐阅读