首页 > 解决方案 > 识别列表项是否在字符串中

问题描述

我正在尝试创建一个嵌套循环序列,它查看一系列停用词和字符串列表,并确定每个停用词是否在每个列表项中。理想情况下,我希望能够将每个字符串中存在的单词添加到新列中,并将它们全部从字符串中删除。

有人有提示吗?我的循环顺序错误吗?

def remove_stops(text, customStops):
    """
    Removes custom stopwords.

    Parameters
    ----------
    text : the variable storing strings from which
        stopwords should be removed. This can be a string
        or a pandas DataFrame.
    customStops : the list of stopwords which should be removed. 

    Returns
    -------
    Cleansed lists.

    """
    for item in text:
        print("Text:", item)
        for word in customStops:
            print("Custom Stops: ", word)
            if word in item:
                print("Word: ", word)
                #Add word to list of words in item
                #Remove word from item
    

标签: pythonloopsfor-loop

解决方案


这是您可以执行的操作:

def remove_stops(text, customStops):
    found = {k:[] for k in text} # Dict for all found stopwords in text
    for i,item in enumerate(text):
        for word in customStops:
            text[i] = text[i].replace(word,'') # Remove all stopwords from each string, if the stopword is not in, the replace will just leave it as it is
            if word in item:
                found[item].append(word)
    return text, found

text = ['Today is my lucky day!',
        'Tomorrow is rainy',
        'Please help!',
        'I want to fly']

customStops = ['help', 'fly']

clean, found = remove_stops(text, customStops)

print(clean)
print(found)

输出:

['Today is my lucky day!',
 'Tomorrow is rainy',
 'Please !',
 'I want to ']


{'Today is my lucky day!': [],
 'Tomorrow is rainy': [],
 'Please help!': ['help'],
 'I want to fly': ['fly']}

推荐阅读