首页 > 解决方案 > 从 Numpy 数组中查找数据框中的子字符串?

问题描述

如何在数据框中找到针对数组的子字符串列表并使用数组值创建新列?例如,我开始使用str.contains并输入实际的字符串值(见下文)。

import pandas as pd
import numpy as np

#Filepath directory
csv_report = filepath

#Creates dataframe of CSV report
csv_df = pd.read_csv(csv_report)
  
csv_df['animal'] = np.where(csv_df.item_name.str.contains('Condor'), "Condor",
                   np.where(csv_df.item_name.str.contains('Marmot'), "Marmot",
                   np.where(csv_df.item_name.str.contains('Bear'),"Bear",
                   np.where(csv_df.item_name.str.contains('Pika'),"Pika",
                   np.where(csv_df.item_name.str.contains('Rat'),"Rat",
                   np.where(csv_df.item_name.str.contains('Racoon'),"Racoon",
                   np.where(csv_df.item_name.str.contains('Opossum'),"Opossum")))))))

如果字符串值在数组中,我将如何实现上述代码?下面的示例:

import pandas as pd
import numpy as np

#Filepath directory
csv_report = filepath

#Creates dataframe of CSV report
csv_df = pd.read_csv(csv_report)

animal_list = np.array(['Condor', 'Marmot','Bear','Pika','Rat','Racoon','Opossum'])

标签: pythonpandasnumpy

解决方案


我认为有一种更简洁的方式来编写它,但它可以满足您的需求。如果您担心区分大小写或全字匹配,则必须根据需要对其进行修改。此外,您不需要 np.array,只需一个列表。

import io
import pandas as pd

data = '''item_name
Condor
Marmot
Bear
Condor a
Marmotb
Bearxyz
'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df

animal_list = ['Condor', 'Marmot','Bear','Pika','Rat','Racoon','Opossum']

def find_matches(x):
    for animal in animal_list:
        if animal in x['item_name']:
            return animal

df.apply(lambda x: find_matches(x), axis=1)

0    Condor
1    Marmot
2      Bear
3    Condor
4    Marmot
5      Bear
dtype: object

推荐阅读