首页 > 解决方案 > 通过运行多个搜索过滤数据集中的行

问题描述

我需要创建一个函数,让我能够过滤数据集中的行,然后创建一个新的过滤数据集,其中仅包含原始数据集中的那些行(和所有列),或者,如果我不想要过滤它们,返回我的原始数据集。
数据集示例如下(列名相同,但行中的值不同):

(1)

Name        Surname      Email                    City
Mary Jane     Koci      m.jane98@hotmail.com        NA
Luke           G.       l.girona@gmail.co.uk       London
...

(2)

Name        Surname      Email                    City
Mary         Peps      mary-peps_2@gmail.com      Madrid
Kate        Dickson    kat_son321@gmail.com       Lisbon
...

我所做的如下:

list_datasets=[]

for i in range(0,2): # I have two datasets that I want to check individually. For each of them, I need to specify key words to filter: key word 'Mary' from the first dataset; key words ' Kim Jun' from the second dataset, ...

    key_words = [] 

    while True:
        words = input("Search: ")

        if query!='-0':
            key_words.append(words)
            df[str(i)]["Key_Word"]= df[str(i)].Names.str.contains('|'.join(key_words), case=False, regex=True)
            df[str(i)].loc[df[str(i)]["Key_Word"] != False]
        elif words=='-0':
            break
        filt=df[str(i)][df[str(i)]["Key_Word"] != False].copy()
        list_datasets.append(filt)  

当我不想检查一个或多个数据集的任何值时,即当我想保留原始数据集时,我遇到了一些问题。当我尝试打印数据集时出现错误:

for i in range(0,2):
    print(list_datasets[i]) 

如下:

IndexError                                Traceback (most recent call last)
<ipython-input-27-e490745a8a26> in <module>
     26 
     27 for i in range(0,2):
    ---> 28     print(list_datasets[i])

IndexError: list index out of range

我只使用 csv 文件。

任何帮助,将不胜感激。谢谢

更新

逻辑应该如下。对于每个数据集:

如果这没有意义并且您需要更多信息,请告诉我。谢谢

标签: pythonpandas

解决方案


这是一些示例代码,它以我在上面评论中描述的方式过滤熊猫数据框。特定用例是在列表中搜索 2 个数据帧以获取值列表(您的 Key_Words)。我制作了 2 个示例数据框并使用了您的列。

这不是对您问题所有方面的完整答案,但我希望它能帮助您找到完整的解决方案。

import pandas as pd

list_datasets = []

# data in the form of list of tuples 
data1 = [('Peter', 'Smith', 'peters@example.com', 'Calgary'), 
         ('Riff', 'Jones', 'riffj@example.com', 'Ontario'), 
         ('John', 'Brown', 'johnb@example.com', 'Manitoba'),
         ('Michel', 'Boudreaux', 'michelb@example.com', 'Montreal')]

data2 = [('Sven', 'Erikson', 'svene@example.com', 'Omaha'), 
         ('Mary', 'Brown', 'maryb@example.com', 'Chicago'), 
         ('Erik', 'Svenson', 'eriks@example.com', 'Dallas'),
         ('Cindy Lou', 'Who', 'cindylouw@example.com', 'Whoville')]

# create DataFrame using data 
df1 = pd.DataFrame(data1, columns =['Name', 'Surname', 'Email', 'City'])
df2 = pd.DataFrame(data2, columns =['Name', 'Surname', 'Email', 'City'])

print("Dataframe1:\n", df1)
print("Dataframe2:\n", df2)

list_datasets.append(df1)
list_datasets.append(df2)
# print out complete dataframes in list
# for dfi in list_datasets:
#     print("Input dataset contains:\n", dfi)

searchfield = "City"
searchlist = ['Ontario', 'Manitoba', 'Chicago', 'Dallas']
searchstr = "|".join(searchlist)
print("We will be searching for the following values:\n", searchstr)

dfout = []
for dfi in list_datasets:
    dfcopy = dfi.loc[dfi[searchfield].str.contains(searchstr, case=False, regex=True)]
    dfout.append(dfcopy)

for dfi in dfout:
    print("Result from dataset:\n", dfi)

输出:

Dataframe1:
      Name    Surname                Email      City
0   Peter      Smith   peters@example.com   Calgary
1    Riff      Jones    riffj@example.com   Ontario
2    John      Brown    johnb@example.com  Manitoba
3  Michel  Boudreaux  michelb@example.com  Montreal
Dataframe2:
         Name  Surname                  Email      City
0       Sven  Erikson      svene@example.com     Omaha
1       Mary    Brown      maryb@example.com   Chicago
2       Erik  Svenson      eriks@example.com    Dallas
3  Cindy Lou      Who  cindylouw@example.com  Whoville
We will be searching for the following values:
 Ontario|Manitoba|Chicago|Dallas
Result from dataset:
    Name Surname              Email      City
1  Riff   Jones  riffj@example.com   Ontario
2  John   Brown  johnb@example.com  Manitoba
Result from dataset:
    Name  Surname              Email     City
1  Mary    Brown  maryb@example.com  Chicago
2  Erik  Svenson  eriks@example.com   Dallas

推荐阅读