首页 > 解决方案 > 检查字符串列表以提取某些单词的有效方法

问题描述

我正在尝试检查 20,000 个字符串列表并与某些单词/短语进行比较,以将它们正确分类为 3 类。

这是字符串的示例列表:

  sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

所以我想检查一个字符串是否有:

    "empty" and "bus" and "empty" then emptyCount += 1

    "order canceled" or "canceled" then cancelcount += 1

    "empty" or "site" or "no empty on site" then site += 1

我有一个代码可以做到这一点,但我不认为它更有效,而且实际上可能会遗漏一些关键点。关于如何去做有什么建议吗?

    site = 0
    cancel = 0
    empty = 0
    count = 0
    for i in sample:
        if "empty" and "bus" and "empty" in i:
           emptycount += 1
        elif "order canceled" or "canceled":
           cancelcount += 1
        elif "empty" or "site" or "no empty on site" 
           site += 1

        else:
           count += 1

标签: python

解决方案


你甚至不需要提取。

您需要做的就是搜索并增加计数

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

empty_counter = 0
for string_item in sample:
    if 'empty' in string_item:
        empty_counter += 1

print(empty_counter)

如果您正在寻找效率,那么我建议使用熊猫。这将使您的效率提高 100 倍,具体取决于数据的大小,它是一个数据科学包,这意味着它可以非常快速地处理数百万个数据。

#import pandas package.
import pandas as pd

sample = ["the empty bus behind me", "the facility is close", "my order was canceled", "no empty on site", "no bus for me to move"]

# create a pandas series
sr = pd.Series(sample) 

#search for match and store results 
results = sr.str.match(pat = '(empty)&(bus)' )

#gives total number of matching items
print(results.shape[0])

推荐阅读